Re: Spark R - Loading Third Party R Library in YARN Executors

2016-08-17 Thread Felix Cheung
When you call library(), that is the library loading function in native R. As of now it does not support HDFS but there are several packages out there that might help. Another approach is to have a prefetch/installation mechanism to call HDFS command to download the R package from HDFS onto the

Re: [discuss] Spark 2.x release cadence

2016-09-27 Thread Felix Cheung
+1 on longer release cycle at schedule and more maintenance releases. _ From: Mark Hamstra mailto:m...@clearstorydata.com>> Sent: Tuesday, September 27, 2016 2:01 PM Subject: Re: [discuss] Spark 2.x release cadence To: Reynold Xin mailto:r...@databricks.com>> Cc: mailt

Re: [VOTE] Release Apache Spark 2.0.1 (RC4)

2016-10-01 Thread Felix Cheung
+1 Tested and didn't find any blocker - found a few minor R doc issues to follow up. _ From: Reynold Xin mailto:r...@databricks.com>> Sent: Wednesday, September 28, 2016 7:15 PM Subject: [VOTE] Release Apache Spark 2.0.1 (RC4) To: mailto:dev@spark.apache.org>> Plea

Re: welcoming Xiao Li as a committer

2016-10-03 Thread Felix Cheung
Congrats and welcome, Xiao! _ From: Reynold Xin mailto:r...@databricks.com>> Sent: Monday, October 3, 2016 10:47 PM Subject: welcoming Xiao Li as a committer To: Xiao Li mailto:gatorsm...@gmail.com>>, mailto:dev@spark.apache.org>> Hi all, Xiao Li, aka gatorsmile, h

Re: PSA: JIRA resolutions and meanings

2016-10-08 Thread Felix Cheung
+1 on this proposal and everyone can contribute to updates and discussions on JIRAs Will be great if this could be put on the Spark wiki. On Sat, Oct 8, 2016 at 9:05 AM -0700, "Ted Yu" mailto:yuzhih...@gmail.com>> wrote: Makes sense. I trust Hyukjin, Holden and Cody's judgement in respect

Re: Suggestion in README.md for guiding pull requests/JIRAs (probably about linking CONTRIBUTING.md or wiki)

2016-10-09 Thread Felix Cheung
Should we just link to https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark On Sun, Oct 9, 2016 at 10:09 AM -0700, "Hyukjin Kwon" mailto:gurwls...@gmail.com>> wrote: Thanks for confirming this, Sean. I filed this in https://issues.apache.org/jira/browse/SPARK-17840 I wou

Re: Straw poll: dropping support for things like Scala 2.10

2016-10-27 Thread Felix Cheung
+1 on Matei's. _ From: Davies Liu mailto:dav...@databricks.com>> Sent: Thursday, October 27, 2016 9:58 AM Subject: Re: Straw poll: dropping support for things like Scala 2.10 To: Matei Zaharia mailto:matei.zaha...@gmail.com>> Cc: Reynold Xin mailto:r...@databricks.com>

Re: SparkR issue with array types in gapply()

2016-10-27 Thread Felix Cheung
This is a R native data.frame behavior. While arr is a character vector of length = 2, > arr [1] "rows= 50" "cols= 2" > length(arr) [1] 2 when it is set as R data.frame the character vector is splitted into 2 rows > data.frame(key, strings = arr, stringsAsFactors = F) key strings 1 a rows= 5

Re: Question about SPARK-11374 (skip.header.line.count)

2016-12-10 Thread Felix Cheung
+1 I think it's useful to always have a pure SQL way and skip header for plain text / csv that lots of companies have. From: Dongjoon Hyun Sent: Friday, December 9, 2016 9:42:58 AM To: Dongjin Lee; dev@spark.apache.org Subject: Re: Question about SPARK-11374 (sk

Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-16 Thread Felix Cheung
For R we have a license field in the DESCRIPTION, and this is standard practice (and requirement) for R packages. https://cran.r-project.org/doc/manuals/R-exts.html#Licensing From: Sean Owen Sent: Friday, December 16, 2016 9:57:15 AM To: Reynold Xin; dev@spark.a

Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-18 Thread Felix Cheung
t 10:23 AM, Joseph Bradley mailto:jos...@databricks.com>> wrote: +1 On Fri, Dec 16, 2016 at 3:21 PM, Herman van Hövell tot Westerflier mailto:hvanhov...@databricks.com>> wrote: +1 On Sat, Dec 17, 2016 at 12:14 AM, Xiao Li mailto:gatorsm...@gmail.com>> wrote: +1 Xiao Li 2016-12-16 12:

Re: ml word2vec finSynonyms return type

2016-12-30 Thread Felix Cheung
Could you link to the JIRA here? What you suggest makes sense to me. Though we might want to maintain compatibility and add a new method instead of changing the return type of the existing one. _ From: Asher Krim mailto:ak...@hubspot.com>> Sent: Wednesday, December

Re: [ML] [GraphFrames] : Bayesian Network framework

2016-12-30 Thread Felix Cheung
GraphFrames has a Belief Propagation example Have you checked it out? graphframes.github.io/api/scala/index.html#org.graphframes.examples.BeliefPropagation$ From: Brian

Re: ml word2vec finSynonyms return type

2017-01-05 Thread Felix Cheung
mailto:ak...@hubspot.com>> Sent: Tuesday, January 3, 2017 11:58 PM Subject: Re: ml word2vec finSynonyms return type To: Felix Cheung mailto:felixcheun...@hotmail.com>> Cc: mailto:manojkumarsivaraj...@gmail.com>>, Joseph Bradley mailto:jos...@databricks.com>>, mailto:dev@spark

Re: Spark GraphFrame ConnectedComponents

2017-01-05 Thread Felix Cheung
. From: Ankur Srivastava Sent: Thursday, January 5, 2017 3:45:59 PM To: Felix Cheung; dev@spark.apache.org Cc: u...@spark.apache.org Subject: Re: Spark GraphFrame ConnectedComponents Adding DEV mailing list to see if this is a defect with ConnectedComponent or if they can recommend any solution

Spark checkpointing

2017-01-07 Thread Felix Cheung
_ From: Steve Loughran Sent: Friday, January 6, 2017 9:57:05 AM To: Ankur Srivastava Cc: Felix Cheung; u...@spark.apache.org Subject: Re: Spark GraphFrame ConnectedComponents On 5 Jan 2017, at 21:10, Ankur Srivastava mailto:ankur.srivast...@gmail.com>> wrote: Yes I did try

Re: Feedback on MLlib roadmap process proposal

2017-01-19 Thread Felix Cheung
Hi Seth Re: "The most important thing we can do, given that MLlib currently has a very limited committer review bandwidth, is to make clear issues that, if worked on, will definitely get reviewed. " We are adopting a Shepherd model, as described in the JIRA Joseph has, in which, when assigned,

Re: welcoming Burak and Holden as committers

2017-01-24 Thread Felix Cheung
Congrats and welcome!! From: Reynold Xin Sent: Tuesday, January 24, 2017 10:13:16 AM To: dev@spark.apache.org Cc: Burak Yavuz; Holden Karau Subject: welcoming Burak and Holden as committers Hi all, Burak and Holden have recently been elected as Apache Spark com

Re: PSA: Java 8 unidoc build

2017-02-07 Thread Felix Cheung
+1 for all the great work going in for this, HyukjinKwon, and +1 on what Sean says about "Jenkins builds with Java 8" and we should catch these nasty javadoc8 issue quickly. I think that would be the great first step to move away from java 7 _ From: Reynold Xin mail

Re: welcoming Takuya Ueshin as a new Apache Spark committer

2017-02-13 Thread Felix Cheung
Congratulations! From: Xuefu Zhang Sent: Monday, February 13, 2017 11:29:12 AM To: Xiao Li Cc: Holden Karau; Reynold Xin; dev@spark.apache.org Subject: Re: welcoming Takuya Ueshin as a new Apache Spark committer Congratulations, Takuya! --Xuefu On Mon, Feb 13,

Re: Should we consider a Spark 2.1.1 release?

2017-03-13 Thread Felix Cheung
+1 there are a lot of good fixes in overall and we need a release for Python and R packages. From: Holden Karau Sent: Monday, March 13, 2017 12:06:47 PM To: Felix Cheung; Shivaram Venkataraman; dev@spark.apache.org Subject: Should we consider a Spark 2.1.1

Re: Outstanding Spark 2.1.1 issues

2017-03-20 Thread Felix Cheung
I've been scrubbing R and think we are tracking 2 issues https://issues.apache.org/jira/browse/SPARK-19237 https://issues.apache.org/jira/browse/SPARK-19925 From: holden.ka...@gmail.com on behalf of Holden Karau Sent: Monday, March 20, 2017 3:12:35 PM To:

Re: [VOTE] Apache Spark 2.1.1 (RC2)

2017-04-02 Thread Felix Cheung
-1 sorry, found an issue with SparkR CRAN check. Opened SPARK-20197 and working on fix. From: holden.ka...@gmail.com on behalf of Holden Karau Sent: Friday, March 31, 2017 6:25:20 PM To: Xiao Li Cc: Michael Armbrust; dev@spark.apache.org Subject: Re: [VOTE] Apac

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-20 Thread Felix Cheung
Tested on both Linux and Windows, as package. Found StackOverflowError with ALS on Windows https://issues.apache.org/jira/browse/SPARK-20402 This is part of the R CRAN check to build the vignettes. Very simple, quick and consistent repo on Windows. The exact same code works fine on Linux. Rep

Re: [SparkR] - options around setting up SparkSession / SparkContext

2017-04-21 Thread Felix Cheung
How would you handle this in Scala? If you are adding a wrapper func like getSparkSession for Scala, and have your users call it, can't you do that same in SparkR? After all, while true you don't need a SparkSession object to call the R API, someone still needs to call sparkR.session() to initi

Re: [SparkR] - options around setting up SparkSession / SparkContext

2017-04-22 Thread Felix Cheung
? _ From: Vin J mailto:winjos...@gmail.com>> Sent: Saturday, April 22, 2017 12:33 AM Subject: Re: [SparkR] - options around setting up SparkSession / SparkContext To: Felix Cheung mailto:felixcheun...@hotmail.com>> Cc: mailto:dev@spark.apache.org>> This is for a noteboo

Re: [VOTE] Apache Spark 2.1.1 (RC4)

2017-04-28 Thread Felix Cheung
+1 Tested R on linux and windows Previous issue with building vignettes on windows with stackoverflow in ALS still reproduce but as confirmed the issue was in 2.1.0 so this isn't a regression (and hope for the best on CRAN..) https://issues.apache.org/jira/browse/SPARK-20402 __

Re: Spark 2.2.0 or Spark 2.3.0?

2017-05-02 Thread Felix Cheung
Yes 2.2.0 From: kant kodali Sent: Monday, May 1, 2017 10:43:44 PM To: dev Subject: Spark 2.2.0 or Spark 2.3.0? Hi All, If I understand the Spark standard release process correctly. It looks like the official release is going to be sometime end of this month and

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-06 Thread Felix Cheung
All tasks on the R QA umbrella are completed SPARK-20512 We can close this. _ From: Sean Owen mailto:so...@cloudera.com>> Sent: Tuesday, June 6, 2017 1:16 AM Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) To: Michael Armbrust mailto:mich...@databricks.com>> Cc: mailto:

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-09 Thread Felix Cheung
Hmm, that's odd. This test would be in Jenkins too - let me double check _ From: Nick Pentreath mailto:nick.pentre...@gmail.com>> Sent: Friday, June 9, 2017 1:12 AM Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) To: dev mailto:dev@spark.apache.org>> All Scala, Python te

Re: SBT / PR builder builds failing on "include an external JAR in SparkR"

2017-06-12 Thread Felix Cheung
Facepalm I broken them - I was making changes to test files and of course Jenkins was only running only R tests since I was only changing R files, and everything passed there. Fix is Seq(sparkHome, "R", "pkg", "inst", "tests", To Seq(sparkHome, "R", "pkg", "tests", "fulltests", And 2 instance

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-13 Thread Felix Cheung
eath mailto:nick.pentre...@gmail.com>>, Felix Cheung mailto:felixcheun...@hotmail.com>> For the test failure on R, I checked: Per https://github.com/apache/spark/tree/v2.2.0-rc4, 1. Windows Server 2012 R2 / R 3.3.1 - passed (https://ci.appveyor.com/project/spark-test/spark/build/755-r-test

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-14 Thread Felix Cheung
Thanks! Will try to setup RHEL/CentOS to test it out _ From: Nick Pentreath mailto:nick.pentre...@gmail.com>> Sent: Tuesday, June 13, 2017 11:38 PM Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) To: Felix Cheung mailto:felixcheun...@hotmail.com>>,

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-15 Thread Felix Cheung
Sounds good. Think we checked and should be good to go. Appreciated. From: Michael Armbrust Sent: Wednesday, June 14, 2017 4:51:48 PM To: Hyukjin Kwon Cc: Felix Cheung; Nick Pentreath; dev; Sean Owen Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) So, it looks

Re: [build system] rolling back R to working version

2017-06-20 Thread Felix Cheung
Thanks Shane! From: shane knapp Sent: Tuesday, June 20, 2017 9:23:57 PM To: dev Subject: Re: [build system] rolling back R to working version this is done... i backported R to 3.1.1 and reinstalled all the R packages so we're starting w/a clean slate. the worke

Re: [VOTE] Apache Spark 2.2.0 (RC6)

2017-07-05 Thread Felix Cheung
+1 (non binding) Tested R, R package on Ubuntu and Windows, CRAN checks, manual tests with steaming & udf. _ From: Denny Lee mailto:denny.g@gmail.com>> Sent: Monday, July 3, 2017 9:30 PM Subject: Re: [VOTE] Apache Spark 2.2.0 (RC6) To: Liang-Chi Hsieh mailto:vii..

Re: Welcoming Hyukjin Kwon and Sameer Agarwal as committers

2017-08-07 Thread Felix Cheung
Congrats!! From: Kevin Kim (Sangwoo) Sent: Monday, August 7, 2017 7:30:01 PM To: Hyukjin Kwon; dev Cc: Bryan Cutler; Mridul Muralidharan; Matei Zaharia; Holden Karau Subject: Re: Welcoming Hyukjin Kwon and Sameer Agarwal as committers Thanks for all of your hard

Re: Welcoming Saisai (Jerry) Shao as a committer

2017-08-29 Thread Felix Cheung
Congrats! From: Wenchen Fan Sent: Tuesday, August 29, 2017 9:21:38 AM To: Kevin Yu Cc: Meisam Fathi; dev Subject: Re: Welcoming Saisai (Jerry) Shao as a committer Congratulations, Saisai! On 29 Aug 2017, at 10:38 PM, Kevin Yu mailto:keviny...@gmail.com>> wrote:

Re: Updates on migration guides

2017-08-31 Thread Felix Cheung
+1 think we do migration guide changes for ML and R in separate JIRA/PR/commit but we definition should have it updated before the release. From: linguin@gmail.com Sent: Wednesday, August 30, 2017 8:27:17 AM To: Dongjoon Hyun Cc: Xiao Li; u...@spark.apache.o

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-01 Thread Felix Cheung
+1 on this and like the suggestion of type in string form. Would it be correct to assume there will be data type check, for example the returned pandas data frame column data types match what are specified. We have seen quite a bit of issues/confusions with that in R. Would it make sense to hav

Re: Putting Kafka 0.8 behind an (opt-in) profile

2017-09-05 Thread Felix Cheung
+1 From: Cody Koeninger Sent: Tuesday, September 5, 2017 8:12:07 AM To: Sean Owen Cc: dev Subject: Re: Putting Kafka 0.8 behind an (opt-in) profile +1 to going ahead and giving a deprecation warning now On Tue, Sep 5, 2017 at 6:39 AM, Sean Owen wrote: > On the

Re: 2.1.2 maintenance release?

2017-09-08 Thread Felix Cheung
+1 on both 2.1.2 and 2.2.1 And would try to help and/or wrangle the release if needed. (Note: trying to backport a few changes to branch-2.1 right now) From: Sean Owen Sent: Friday, September 8, 2017 12:05:28 AM To: Holden Karau; dev Subject: Re: 2.1.2 maintenan

Re: 2.1.2 maintenance release?

2017-09-10 Thread Felix Cheung
Hi - what are the next steps? Pending changes are pushed and checked that there is no open JIRA targeting 2.1.2 and 2.2.1 _ From: Reynold Xin mailto:r...@databricks.com>> Sent: Friday, September 8, 2017 9:27 AM Subject: Re: 2.1.2 maintenance release? To: Felix

Re: [VOTE] Spark 2.1.2 (RC1)

2017-09-14 Thread Felix Cheung
+1 tested SparkR package on Windows, r-hub, Ubuntu. _ From: Sean Owen mailto:so...@cloudera.com>> Sent: Thursday, September 14, 2017 3:12 PM Subject: Re: [VOTE] Spark 2.1.2 (RC1) To: Holden Karau mailto:hol...@pigscanfly.ca>>, mailto:dev@spark.apache.org>> +1 Very ni

Re: [VOTE] Spark 2.1.2 (RC1)

2017-09-15 Thread Felix Cheung
Yes ;) From: Xiao Li Sent: Friday, September 15, 2017 2:22:03 PM To: Holden Karau Cc: Ryan Blue; Denny Lee; Felix Cheung; Sean Owen; dev@spark.apache.org Subject: Re: [VOTE] Spark 2.1.2 (RC1) Sorry, this release candidate is 2.1.2. The issue is in 2.2.1. 2017

RE: SparkR read.df Option type doesn't match

2015-11-27 Thread Felix Cheung
Yes - please see the code example on the SparkR API doc: http://spark.apache.org/docs/latest/api/R/read.df.html Suggestion or contribution to improve the doc is welcome! > Date: Thu, 26 Nov 2015 15:08:31 -0700 > From: s...@phemi.com > To: dev@spark.apache.org > Subject: Re: SparkR read.df Optio

RE: Are we running SparkR tests in Jenkins?

2016-01-17 Thread Felix Cheung
I think that breaks sparkR, the commandline script, and Jenkins, in which run-test.sh is calling sparkR. I'll work on this - since this also affects my PR #10652... Date: Fri, 15 Jan 2016 15:33:13 -0800 Subject: Re: Are we running SparkR tests in Jenkins? From: zjf...@gmail.com To: shiva...@eecs

Re: Fwd: Writing to jdbc database from SparkR (1.5.2)

2016-02-06 Thread Felix Cheung
Unfortunately I couldn't find a simple workaround. It seems to be an issue with DataFrameWriter.save() that does not work with jdbc source/format For instance, this does not work in Scala eitherdf1.write.format("jdbc").mode("overwrite").option("url", "jdbc:mysql://something.rds.amazonaws.com:330

RE: Fwd: Writing to jdbc database from SparkR (1.5.2)

2016-02-07 Thread Felix Cheung
That does but it's a bit hard to call from R since it is not exposed. On Sat, Feb 6, 2016 at 11:57 PM -0800, "Sun, Rui" wrote: DataFrameWrite.jdbc() does not work? From: Felix Cheung [mailto:felixcheun...@hotmail.com] Sent: Sunday, February 7, 2016 9:54 AM To: Andr

RE: Fwd: Writing to jdbc database from SparkR (1.5.2)

2016-02-07 Thread Felix Cheung
I mean not exposed from the SparkR API. Calling it from R without a SparkR API would require either a serializer change or a JVM wrapper function. On Sun, Feb 7, 2016 at 4:47 AM -0800, "Felix Cheung" wrote: That does but it's a bit hard to call from R since it is not ex

RE: Fwd: Writing to jdbc database from SparkR (1.5.2)

2016-02-07 Thread Felix Cheung
Correct :) _ From: Sun, Rui Sent: Sunday, February 7, 2016 5:19 AM Subject: RE: Fwd: Writing to jdbc database from SparkR (1.5.2) To: , Felix Cheung , Andrew Holway This should be solved by your pending PR https://github.com/apache

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-04-01 Thread Felix Cheung
+1 From: Denny Lee Sent: Monday, April 1, 2024 10:06:14 AM To: Hussein Awala Cc: Chao Sun ; Hyukjin Kwon ; Mridul Muralidharan ; dev Subject: Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect) +1 (non-binding) On Mon, Apr 1, 2024 at 9:24 AM Hussein

<    1   2   3