Re: let spark streaming sample come to stop

2015-11-16 Thread Bryan Cutler
Hi Renyi, This is the intended behavior of the streaming HdfsWordCount example. It makes use of a 'textFileStream' which will monitor a hdfs directory for any newly created files and push them into a dstream. It is meant to be run indefinitely, unless interrupted by ctrl-c, for example. -bryan

Re: A bug in Spark standalone? Worker registration and deregistration

2015-12-10 Thread Bryan Cutler
Hi Jacek, I also recently noticed those messages, and some others, and am wondering if there is an issue. I am also seeing the following when I have event logging enabled. The first application is submitted and executes fine, but all subsequent attempts produce an error log, but the master

Re: running lda in spark throws exception

2016-01-08 Thread Bryan Cutler
Hi Li, I tried out your code and sample data in both local mode and Spark Standalone and it ran correctly with output that looks good. Sorry, I don't have a YARN cluster setup right now, so maybe the error you are seeing is specific to that. Btw, I am running the latest Spark code from the

Re: Welcoming Yanbo Liang as a committer

2016-06-05 Thread Bryan Cutler
Congratulations Yanbo! On Jun 5, 2016 4:03 AM, "Kousuke Saruta" wrote: > Congratulations Yanbo! > > > - Kousuke > > On 2016/06/04 11:48, Matei Zaharia wrote: > >> Hi all, >> >> The PMC recently voted to add Yanbo Liang as a committer. Yanbo has been >> a super active

Re: running lda in spark throws exception

2016-01-13 Thread Bryan Cutler
+ topic + ":") > > > > for (word <- Range(0, ldaModel.vocabSize)) { print(" " + > > topics(word, topic)); } > > > > println() > > > > } > > > > > > // Save and load model. > > > > l

Re: running lda in spark throws exception

2016-01-14 Thread Bryan Cutler
So the workaround is to process the input to re-encode terms? > > On Thu, Jan 14, 2016 at 6:53 AM, Bryan Cutler <cutl...@gmail.com> wrote: > > I was now able to reproduce the exception using the master branch and > local > > mode. It looks like the problem is the vectors of

Re: pull request template

2016-03-19 Thread Bryan Cutler
+1 on Marcelo's comments. It would be nice not to pollute commit messages with the instructions because some people might forget to remove them. Nobody has suggested removing the template. On Tue, Mar 15, 2016 at 3:59 PM, Joseph Bradley wrote: > +1 for keeping the

Re: Organizing Spark ML example packages

2016-04-19 Thread Bryan Cutler
+1, adding some organization would make it easier for people to find a specific example On Mon, Apr 18, 2016 at 11:52 PM, Yanbo Liang wrote: > This sounds good to me, and it will make ML examples more neatly. > > 2016-04-14 5:28 GMT-07:00 Nick Pentreath

Re: AccumulatorV2 += operator

2016-08-03 Thread Bryan Cutler
unction can > be for someone extending the accumulator (but it certainly could cause > confusion). > > Reynold can provide a more definitive answer in this case. > > On Tue, Aug 2, 2016 at 1:46 PM, Bryan Cutler <cutl...@gmail.com> wrote: > >> It seems like the += operator i

AccumulatorV2 += operator

2016-08-02 Thread Bryan Cutler
It seems like the += operator is missing from the new accumulator API, although the docs still make reference to it. Anyone know if it was intentionally not put in? I'm happy to do a PR for it or update the docs to just use the add() method, just want to check if there was some reason first.

Re: welcoming Burak and Holden as committers

2017-01-25 Thread Bryan Cutler
Congratulations Holden and Burak, well deserved!!! On Tue, Jan 24, 2017 at 10:13 AM, Reynold Xin wrote: > Hi all, > > Burak and Holden have recently been elected as Apache Spark committers. > > Burak has been very active in a large number of areas in Spark, including >

Re: welcoming Xiao Li as a committer

2016-10-04 Thread Bryan Cutler
Congrats Xiao! On Tue, Oct 4, 2016 at 11:14 AM, Holden Karau wrote: > Congratulations :D :) Yay! > > On Tue, Oct 4, 2016 at 11:14 AM, Suresh Thalamati < > suresh.thalam...@gmail.com> wrote: > >> Congratulations, Xiao! >> >> >> >> > On Oct 3, 2016, at 10:46 PM, Reynold Xin

Re: Belief propagation algorithm is open sourced

2016-12-14 Thread Bryan Cutler
I'll check it out, thanks for sharing Alexander! On Dec 13, 2016 4:58 PM, "Ulanov, Alexander" wrote: > Dear Spark developers and users, > > > HPE has open sourced the implementation of the belief propagation (BP) > algorithm for Apache Spark, a popular message passing

Re: Why is there no flatten method on RDD?

2016-12-14 Thread Bryan Cutler
Hi Tarun, I think there just hasn't been a strong need for it when you can accomplish the same with just rdd.flatMap(identity). I see a JIRA was just opened for this https://issues.apache.org/jira/browse/SPARK-18855 On Mon, Dec 5, 2016 at 2:55 PM, Tarun Kumar wrote: >

Some PRs not automatically linked to JIRAs

2017-08-02 Thread Bryan Cutler
Hi Devs, I've noticed a couple PRs recently have not been automatically linked to the related JIRAs. This was one of mine (I linked it manually) https://issues.apache.org/jira/browse/SPARK-21583, but I've seen it happen elsewhere. I think this is the script that does it, but it hasn't been

Re: Some PRs not automatically linked to JIRAs

2017-08-02 Thread Bryan Cutler
rlier today and had run it manually, but >> yeah I'm not sure where it normally runs or why it hasn't. Shane not sure >> if you're the person to ask? >> >> >> On Wed, Aug 2, 2017 at 7:47 PM Bryan Cutler <cutl...@gmail.com> wrote: >> >>> Hi Devs, >&

Re: Run a specific PySpark test or group of tests

2017-08-15 Thread Bryan Cutler
This generally works for me to just run tests within a class or even a single test. Not as flexible as pytest -k, which would be nice.. $ SPARK_TESTING=1 bin/pyspark pyspark.sql.tests ArrowTests On Tue, Aug 15, 2017 at 5:49 AM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > Pytest

Re: Welcoming Hyukjin Kwon and Sameer Agarwal as committers

2017-08-07 Thread Bryan Cutler
Great work Hyukjin and Sameer! On Mon, Aug 7, 2017 at 10:22 AM, Mridul Muralidharan wrote: > Congratulations Hyukjin, Sameer ! > > Regards, > Mridul > > On Mon, Aug 7, 2017 at 8:53 AM, Matei Zaharia > wrote: > > Hi everyone, > > > > The Spark PMC

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-07 Thread Bryan Cutler
+1 (non-binding) for the goals and non-goals of this SPIP. I think it's fine to work out the minor details of the API during review. Bryan On Wed, Sep 6, 2017 at 5:17 AM, Takuya UESHIN wrote: > Hi all, > > Thank you for voting and suggestions. > > As Wenchen mentioned

Re: Integrating ML/DL frameworks with Spark

2018-05-15 Thread Bryan Cutler
Thanks for starting this discussion, I'd also like to see some improvements in this area and glad to hear that the Pandas UDFs / Arrow functionality might be useful. I'm wondering if from your initial investigations you found anything lacking from the Arrow format or possible improvements that

Re: Feedback on first commit + jira issue I opened

2018-05-31 Thread Bryan Cutler
Hi Andrew, Please just go ahead and make the pull request. It's easier to review and give feedback, thanks! Bryan On Thu, May 31, 2018 at 9:44 AM, Long, Andrew wrote: > Hello Friends, > > > > I’m a new committer and I’ve submitted my first patch and I had some > questions about documentation

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-04 Thread Bryan Cutler
+1 On Mon, Jun 4, 2018 at 10:18 AM, Joseph Bradley wrote: > +1 > > On Mon, Jun 4, 2018 at 10:16 AM, Mark Hamstra > wrote: > >> +1 >> >> On Fri, Jun 1, 2018 at 3:29 PM Marcelo Vanzin >> wrote: >> >>> Please vote on releasing the following candidate as Apache Spark version >>> 2.3.1. >>> >>>

Thoughts on Cloudpickle Update

2018-01-15 Thread Bryan Cutler
Hi All, I've seen a couple issues lately related to cloudpickle, notably https://issues.apache.org/jira/browse/SPARK-22674, and would like to get some feedback on updating the version in PySpark which should fix these issues and allow us to remove some workarounds. Spark is currently using a

Re: Thoughts on Cloudpickle Update

2018-01-19 Thread Bryan Cutler
ssages which version of >>>> cloudpickle we end up upgrading to. >>>> >>>> +1: PR description, commit message or any unit to identify each will be >>>> useful. >>>> It should be easier once we have a matched version. >>&g

Re: [VOTE] Spark 2.3.0 (RC5)

2018-02-23 Thread Bryan Cutler
+1 Tests passed and additionally ran Arrow related tests and did some perf checks with python 2.7.14 On Fri, Feb 23, 2018 at 6:18 PM, Holden Karau <hol...@pigscanfly.ca> wrote: > Note: given the state of Jenkins I'd love to see Bryan Cutler or someone > with Arrow experie

Re: JIRA access

2018-02-23 Thread Bryan Cutler
Hi Arun, The general process is to just leave a comment in the JIRA that you are working on it so others know. Once your pull request is merged, the JIRA will be assigned to you. You can read http://spark.apache.org/contributing.html for details. On Fri, Feb 23, 2018 at 9:08 PM, Arun

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-10 Thread Bryan Cutler
I agree that we should hold off on the Arrow upgrade if it requires major changes to our testing. I did have another thought that maybe we could just add another job to test against Python 3.5 and pyarrow 0.10.0 and keep all current testing the same? I'm not sure how doable that is right now and

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-06 Thread Bryan Cutler
Hi All, I'd like to request a few days extension to the code freeze to complete the upgrade to Apache Arrow 0.10.0, SPARK-23874. This upgrade includes several key improvements and bug fixes. The RC vote just passed this morning and code changes are complete in

Re: [discuss][minor] impending python 3.x jenkins upgrade... 3.5.x? 3.6.x?

2018-08-20 Thread Bryan Cutler
Thanks for looking into this Shane! If we are choosing a single python 3.x, I think 3.6 would be good. It might still be nice to test against other versions too, so we can catch any issues. Is it possible to have more exhaustive testing as part of a nightly or scheduled build? As a point of

Re: Thoughts on Cloudpickle Update

2018-01-18 Thread Bryan Cutler
ickle dev? > > I am technically involved in cloudpickle dev although less active. > They changed default pickle protocol (https://github.com/cloudpipe/ > cloudpickle/pull/127). So, if we target 0.5.x+, we should double check > the potential compatibility issue, or fix the protocol, which

Re: Silencing messages from Ivy when calling spark-submit

2018-03-06 Thread Bryan Cutler
k > > On Mon, Mar 5, 2018 at 2:11 PM Bryan Cutler <cutl...@gmail.com> wrote: > >> Hi Nick, >> >> Not sure about changing the default to warnings only because I think some >> might find the resolution output useful, but you can specify your own ivy >> set

Re: 回复: Welcome Zhenhua Wang as a Spark committer

2018-04-02 Thread Bryan Cutler
Congratulations Zhenhua! On Mon, Apr 2, 2018 at 12:01 PM, ron8hu wrote: > Congratulations, Zhenhua! Well deserved!! > > Ron > > > > -- > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > >

Re: Welcoming some new committers

2018-03-05 Thread Bryan Cutler
Thanks everyone, this is very exciting! I'm looking forward to working with you all and helping out more in the future. Also, congrats to the other committers as well!!

Re: Silencing messages from Ivy when calling spark-submit

2018-03-05 Thread Bryan Cutler
Hi Nick, Not sure about changing the default to warnings only because I think some might find the resolution output useful, but you can specify your own ivy settings file with "spark.jars.ivySettings" to point to your ivysettings.xml file. Would that work for you to configure it there? Bryan

Re: Arrow optimization in conversion from R DataFrame to Spark DataFrame

2018-11-09 Thread Bryan Cutler
Great work Hyukjin! I'm not too familiar with R, but I'll take a look at the PR. Bryan On Fri, Nov 9, 2018 at 9:19 AM Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > Thanks Hyukjin! Very cool results > > Shivaram > On Fri, Nov 9, 2018 at 10:58 AM Felix Cheung > wrote: > > > >

Re: welcome a new batch of committers

2018-10-04 Thread Bryan Cutler
Congratulations everyone! Very well deserved!! On Wed, Oct 3, 2018, 1:59 AM Reynold Xin wrote: > Hi all, > > The Apache Spark PMC has recently voted to add several new committers to > the project, for their contributions: > > - Shane Knapp (contributor to infra) > - Dongjoon Hyun (contributor

Re: python tests: any reason for a huge tests.py?

2018-09-13 Thread Bryan Cutler
Hi Imran, I agree it would be good to split up the tests, but there might be a couple things to discuss first. Right now we have a single "test.py" for each subpackage. I think it makes sense to roughly have a test file for most modules, e.g. "test_rdd.py", but it might not always be clear cut

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-04-02 Thread Bryan Cutler
st saying the next release. >>> >>> In any case I think in the next release it will be great to get more >>> Python 3.x release test coverage. >>> >>> >>> >>> -- >>> *From:* shane knapp >>>

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-26 Thread Bryan Cutler
ch 25, 2019 6:48 PM >> *To:* Hyukjin Kwon >> *Cc:* dev; Bryan Cutler; Takuya UESHIN; shane knapp >> *Subject:* Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276] >> >> I don't know a lot about Arrow here, but seems reasonable. Is this for >> Spark 3.0 or

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-29 Thread Bryan Cutler
:58 PM Felix Cheung wrote: > 3.4 is end of life but 3.5 is not. From your link > > we expect to release Python 3.5.8 around September 2019. > > > > -- > *From:* shane knapp > *Sent:* Thursday, March 28, 2019 7:54 PM > *To:* Hyukjin Kwon &g

Re: [pyspark] dataframe map_partition

2019-03-08 Thread Bryan Cutler
Hi Peng, I just added support for scalar Pandas UDF to return a StructType as a Pandas DataFrame in https://issues.apache.org/jira/browse/SPARK-23836. Is that the functionality you are looking for? Bryan On Thu, Mar 7, 2019 at 1:13 PM peng yu wrote: > right now, i'm using the colums-at-a-time

Re: Welcome Jose Torres as a Spark committer

2019-01-30 Thread Bryan Cutler
Congrats Jose! On Tue, Jan 29, 2019, 10:48 AM Shixiong Zhu Hi all, > > The Apache Spark PMC recently added Jose Torres as a committer on the > project. Jose has been a major contributor to Structured Streaming. Please > join me in welcoming him! > > Best Regards, > > Shixiong Zhu > >

Re: Thoughts on dataframe cogroup?

2019-04-08 Thread Bryan Cutler
Chirs, an SPIP sounds good to me. I agree with Li that it wouldn't be too difficult to extend the currently functionality to transfer multiple DataFrames. For the SPIP, I would keep it more high-level and I don't think it's necessary to include details of the Python worker, we can hash that out

Re: Should python-2 be supported in Spark 3.0?

2019-05-31 Thread Bryan Cutler
+1 and the draft sounds good On Thu, May 30, 2019, 11:32 AM Xiangrui Meng wrote: > Here is the draft announcement: > > === > Plan for dropping Python 2 support > > As many of you already knew, Python core development team and many > utilized Python packages like Pandas and NumPy will drop

Re: [DISCUSS] Increasing minimum supported version of Pandas

2019-06-14 Thread Bryan Cutler
shane knapp wrote: > ah, ok... should we downgrade the testing env on jenkins then? any > specific version? > > shane, who is loathe (and i mean LOATHE) to touch python envs ;) > > On Fri, Jun 14, 2019 at 10:08 AM Bryan Cutler wrote: > >> I should have stated this

Re: [DISCUSS] Increasing minimum supported version of Pandas

2019-06-14 Thread Bryan Cutler
y the last easy >>> chance we’ll have to bump version numbers easily I’d suggest 0.24.2 >>> >>> >>> On Fri, Jun 14, 2019 at 4:38 AM Hyukjin Kwon >>> wrote: >>> >>>> I am +1 to go for 0.23.2 - it brings some overhead to test PyArrow and >>>&g

Re: [DISCUSS] Increasing minimum supported version of Pandas

2019-06-14 Thread Bryan Cutler
- > *From:* Holden Karau > *Sent:* Friday, June 14, 2019 11:06:15 AM > *To:* Felix Cheung > *Cc:* Bryan Cutler; Dongjoon Hyun; Hyukjin Kwon; dev; shane knapp > *Subject:* Re: [DISCUSS] Increasing minimum supported version of Pandas > > Are there other Python dependencie

[DISCUSS] Increasing minimum supported version of Pandas

2019-06-13 Thread Bryan Cutler
Hi All, We would like to discuss increasing the minimum supported version of Pandas in Spark, which is currently 0.19.2. Pandas 0.19.2 was released nearly 3 years ago and there are some workarounds in PySpark that could be removed if such an old version is not required. This will help to keep

Re: [SPARK-25079] moving from python 3.4 to python 3.6.8, impacts all active branches

2019-04-18 Thread Bryan Cutler
Great work, thanks Shane! On Thu, Apr 18, 2019 at 2:46 PM shane knapp wrote: > alrighty folks, the future is here and we'll be moving to python 3.6 > monday! > > all three PRs are green! > master PR: https://github.com/apache/spark/pull/24266 > 2.4 PR:

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-19 Thread Bryan Cutler
+1 (non-binding) On Thu, Apr 18, 2019 at 11:41 AM Jason Lowe wrote: > +1 (non-binding). Looking forward to seeing better support for processing > columnar data. > > Jason > > On Tue, Apr 16, 2019 at 10:38 AM Tom Graves > wrote: > >> Hi everyone, >> >> I'd like to call for a vote on

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-02 Thread Bryan Cutler
ce with > > something that expects the data in arrow format will already have to know > > what version of the format it was programmed against and in the worst > case > > if the layout does change we can support the new layout if needed. > > > > > > On Sun, Apr 21,

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-08 Thread Bryan Cutler
+1 (non-binding) On Tue, May 7, 2019 at 12:04 PM Bobby Evans wrote: > I am +! > > On Tue, May 7, 2019 at 1:37 PM Thomas graves wrote: > >> Hi everyone, >> >> I'd like to call for another vote on SPARK-27396 - SPIP: Public APIs >> for extended Columnar Processing Support. The proposal is to

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-20 Thread Bryan Cutler
ormat are mostly in > Python. > > > 3. Simple operations, though benefits vectorization, might not be > worth the data exchange overhead. > > > > > > So would an improved Pandas UDF API would be good enough? For example, > SPARK-26412 (UDF that takes an iterator of

Re: Thoughts on dataframe cogroup?

2019-04-23 Thread Bryan Cutler
t; this available since most of the work would be done? >>>>>> >>>>>> On Mon, Apr 15, 2019 at 7:50 AM Li Jin wrote: >>>>>> >>>>>>> Thank you Chris, this looks great. >>>>>>> >>>>&g

Re: [VOTE] [SPARK-27495] SPIP: Support Stage level resource configuration and scheduling

2019-09-11 Thread Bryan Cutler
+1 (non-binding), looks good! On Wed, Sep 11, 2019 at 10:05 AM Ryan Blue wrote: > +1 > > This is going to be really useful. Thanks for working on it! > > On Wed, Sep 11, 2019 at 9:38 AM Felix Cheung > wrote: > >> +1 >> >> -- >> *From:* Thomas graves >> *Sent:*

Re: Welcoming some new committers and PMC members

2019-09-17 Thread Bryan Cutler
Congratulations, all well deserved! On Thu, Sep 12, 2019, 3:32 AM Jacek Laskowski wrote: > Hi, > > What a great news! Congrats to all awarded and the community for voting > them in! > > p.s. I think it should go to the user mailing list too. > > Pozdrawiam, > Jacek Laskowski > >

Re: [DISCUSS] New sections in Github Pull Request description template

2019-07-26 Thread Bryan Cutler
The k8s template is pretty good. Under the behavior change section, it would be good to add instructions to also describe previous and new behavior as Hyukjin proposed. On Tue, Jul 23, 2019 at 10:07 PM Reynold Xin wrote: > I like the spirit, but not sure about the exact proposal. Take a look at

[DISCUSS] Remove sorting of fields in PySpark SQL Row construction

2019-11-04 Thread Bryan Cutler
Currently, when a PySpark Row is created with keyword arguments, the fields are sorted alphabetically. This has created a lot of confusion with users because it is not obvious (although it is stated in the pydocs) that they will be sorted alphabetically. Then later when applying a schema and the

Re: [DISCUSS] Deprecate Python < 3.6 in Spark 3.0

2019-10-31 Thread Bryan Cutler
+1 for deprecating On Wed, Oct 30, 2019 at 2:46 PM Shane Knapp wrote: > sure. that shouldn't be too hard, but we've historically given very > little support to it. > > On Wed, Oct 30, 2019 at 2:31 PM Maciej Szymkiewicz > wrote: > >> Could we upgrade to PyPy3.6 v7.2.0? >> On 10/30/19 9:45 PM,

Re: [build system] Upgrading pyarrow, builds might be temporarily broken

2019-11-14 Thread Bryan Cutler
Update: #26133 <https://github.com/apache/spark/pull/26133> has been merged and builds should be passing now, thanks all! On Thu, Nov 14, 2019 at 4:12 PM Bryan Cutler wrote: > We are in the process of upgrading pyarrow in the testing environment, > which might cause pyspark test fa

[build system] Upgrading pyarrow, builds might be temporarily broken

2019-11-14 Thread Bryan Cutler
We are in the process of upgrading pyarrow in the testing environment, which might cause pyspark test failures until https://github.com/apache/spark/pull/26133 is merged. Apologies for the lack of notice beforehand, but I jumped the gun a little and forgot this would affect other builds too. The

Re: Slower than usual on PRs

2019-12-16 Thread Bryan Cutler
Sorry to hear this Holden! Hope you get well soon and take it easy!! On Tue, Dec 3, 2019 at 6:21 PM Hyukjin Kwon wrote: > Yeah, please take care of your heath first! > > 2019년 12월 3일 (화) 오후 1:32, Wenchen Fan 님이 작성: > >> Sorry to hear that. Hope you get better soon! >> >> On Tue, Dec 3, 2019 at

Re: Revisiting Python / pandas UDF (continues)

2019-12-16 Thread Bryan Cutler
Thanks for taking this on Hyukjin! I'm looking forward to the PRs and happy to help out where I can. Bryan On Wed, Dec 4, 2019 at 9:13 PM Hyukjin Kwon wrote: > Hi all, > > I would like to finish redesigning Pandas UDF ones in Spark 3.0. > If you guys don't have a minor concern in general about

Re: Welcoming some new Apache Spark committers

2020-07-14 Thread Bryan Cutler
Congratulations and welcome! On Tue, Jul 14, 2020 at 12:36 PM Xingbo Jiang wrote: > Welcome, Huaxin, Jungtaek, and Dilip! > > Congratulations! > > On Tue, Jul 14, 2020 at 10:37 AM Matei Zaharia > wrote: > >> Hi all, >> >> The Spark PMC recently voted to add several new committers. Please join

Re: [vote] Apache Spark 3.0 RC3

2020-06-08 Thread Bryan Cutler
+1 (non-binding) On Mon, Jun 8, 2020, 1:49 PM Tom Graves wrote: > +1 > > Tom > > On Saturday, June 6, 2020, 03:09:09 PM CDT, Reynold Xin < > r...@databricks.com> wrote: > > > Please vote on releasing the following candidate as Apache Spark version > 3.0.0. > > The vote is open until [DUE DAY]

Re: [VOTE] SPIP: Support pandas API layer on PySpark

2021-03-26 Thread Bryan Cutler
+1 (non-binding) On Fri, Mar 26, 2021 at 9:49 AM Maciej wrote: > +1 (nonbinding) > > On 3/26/21 3:52 PM, Hyukjin Kwon wrote: > > Hi all, > > I’d like to start a vote for SPIP: Support pandas API layer on PySpark. > > The proposal is to embrace Koalas in PySpark to have the pandas API layer > on

Re: Welcoming six new Apache Spark committers

2021-03-29 Thread Bryan Cutler
Congratulations everyone! On Sun, Mar 28, 2021 at 11:00 PM ML Books wrote: > Congrats all > > On Sat, Mar 27, 2021, 1:58 AM Matei Zaharia > wrote: > >> Hi all, >> >> The Spark PMC recently voted to add several new committers. Please join >> me in welcoming them to their new role! Our new

Re: [DISCUSS] Support pandas API layer on PySpark

2021-03-16 Thread Bryan Cutler
+1 the proposal sounds good to me. Having a familiar API built-in will really help new users get into using Spark that might only have Pandas experience. It sounds like maintenance costs should be manageable, once the hurdle with setting up tests is done. Just out of curiosity, does Koalas pretty

Re: Introducing "Pandas API on Spark" component in JIRA, and use "PS" PR title component

2022-05-19 Thread Bryan Cutler
+1, sounds good On Wed, May 18, 2022 at 9:16 PM Dongjoon Hyun wrote: > +1 > > Thank you for the suggestion, Hyukjin. > > Dongjoon. > > On Wed, May 18, 2022 at 11:08 AM Bjørn Jørgensen > wrote: > >> +1 >> But can will have PR Title and PR label the same, PS >> >> ons. 18. mai 2022 kl. 18:57