[GitHub] spark pull request #20436: [MINOR] Fix typos in dev/* scripts.

2018-01-30 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/20436#discussion_r164757437 --- Diff: dev/lint-python --- @@ -60,9 +60,9 @@ export "PYLINT_HOME=$PYTHONPATH" export "PATH=$PYTHONPATH:$PATH"

[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

2017-08-16 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/18926 Agreed with @HyukjinKwon. This PR has a very narrow goal -- improving the error messages -- which I think it accomplished. I think @gatorsmile was expecting a more significant set of improvements

[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

2017-08-15 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/18926 It's cleaner but less specific. Unless we branch on whether `startPos` and `length` are the same type, we will give the same error message for mixed types and for unsupported types. That seems

[GitHub] spark pull request #18926: [SPARK-21712] [PySpark] Clarify type error for Co...

2017-08-15 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/18926#discussion_r133186642 --- Diff: python/pyspark/sql/tests.py --- @@ -1220,6 +1220,18 @@ def test_rand_functions(self): rndn2 = df.select('key', functions.randn(0

[GitHub] spark pull request #18926: [SPARK-21712] [PySpark] Clarify type error for Co...

2017-08-15 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/18926#discussion_r133180053 --- Diff: python/pyspark/sql/tests.py --- @@ -1220,6 +1220,13 @@ def test_rand_functions(self): rndn2 = df.select('key', functions.randn(0

[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

2017-08-15 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/18926 @gatorsmile > Even if we plan to drop `long` in this PR We are not dropping `long` in this PR. It was [never supported](https://github.com/apache/spark/pull/18

[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

2017-08-14 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/18926 I think my latest commits address the concerns raised here. Let me know if I missed or misunderstood anything. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request #18926: [SPARK-21712] [PySpark] Clarify type error for Co...

2017-08-14 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/18926#discussion_r133029498 --- Diff: python/pyspark/sql/column.py --- @@ -406,8 +406,14 @@ def substr(self, startPos, length): [Row(col=u'Ali'), Row(col=u'Bob

[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

2017-08-14 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/18926 To summarize the feedback from @HyukjinKwon and @gatorsmile, I think what I need to do is: * Add a test for the mixed type case. * Explicitly check for `long` in Python 2 and throw

[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

2017-08-11 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/18926 Oh, like a docstring test for the type error? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18926: [SPARK-21712] [PySpark] Clarify type error for Column.su...

2017-08-11 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/18926 Pinging freshly minted committer @HyukjinKwon for a review on this tiny PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #18926: [SPARK-21712] [PySpark] Clarify type error for Co...

2017-08-11 Thread nchammas
GitHub user nchammas opened a pull request: https://github.com/apache/spark/pull/18926 [SPARK-21712] [PySpark] Clarify type error for Column.substr() Proposed changes: * Clarify the type error that `Column.substr()` gives. Test plan: * Tested this manually

[GitHub] spark pull request #18818: [SPARK-21110][SQL] Structs, arrays, and other ord...

2017-08-07 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/18818#discussion_r131640333 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala --- @@ -79,18 +79,6 @@ private[sql] class TypeCollection(private

[GitHub] spark issue #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to replace ...

2017-08-03 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/18820 > I don't think we should allow user to change field nullability while doing replace. Why not? As long as we correctly update the schema from non-nullable to nullable, it seems OK to

[GitHub] spark issue #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to replace ...

2017-08-03 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/18820 Jenkins test this please. (Let's see if I still have the magic power.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to r...

2017-08-03 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/18820#discussion_r131208895 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1423,8 +1434,9 @@ def all_of_(xs): subset = [subset] # Verify we were

[GitHub] spark issue #3029: [SPARK-4017] show progress bar in console

2017-07-11 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/3029 `spark.ui.showConsoleProgress=false` works for me. I pass it via `--conf` to `spark-submit`. Try that if you haven't already. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #17922: [SPARK-20601][PYTHON][ML] Python API Changes for ...

2017-05-09 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/17922#discussion_r115497704 --- Diff: python/pyspark/ml/tests.py --- @@ -71,6 +71,34 @@ ser = PickleSerializer() +def generate_multinomial_logistic_input

[GitHub] spark pull request #17922: [SPARK-20601][PYTHON][ML] Python API Changes for ...

2017-05-09 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/17922#discussion_r115497473 --- Diff: python/pyspark/ml/classification.py --- @@ -374,6 +415,48 @@ def getFamily(self): """ return se

[GitHub] spark issue #13257: [SPARK-15474][SQL]ORC data source fails to write and rea...

2017-03-01 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/13257 The discussion on [ORC-152](https://issues.apache.org/jira/browse/ORC-152) suggests that this is an issue with Spark's DataFrame writer for ORC, not with ORC itself. If you have evidence

[GitHub] spark pull request #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace impr...

2017-02-12 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/16793#discussion_r100701818 --- Diff: python/pyspark/sql/tests.py --- @@ -1591,6 +1591,67 @@ def test_replace(self): self.assertEqual(row.age, 10

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-01-30 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/12004 > the AWS SDK you get will be in sync with hadoop-aws; you have to keep them in sync. Did you mean here, "you _don't_ have to keep them in sync"? > Depen

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-01-21 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/12004 > This won't be enabled in a default build of Spark. Okie doke. I don't want to derail the PR review here, but I'll ask since it's on-topic: Is there a way for projects l

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-01-20 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/12004 Thanks for elaborating on where this work will help @steveloughran. Again, just speaking from my own point of view as Spark user and [Flintrock](https://github.com/nchammas/flintrock) maintainer

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-01-19 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/12004 > Does a build of Spark + Hadoop 2.7 right now have no ability at all to read from S3 out of the box, or just not full / ideal support? No ability at all, as far as I can tell. Peo

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-01-18 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/12004 As a dumb end-user, and as the maintainer of [Flintrock](https://github.com/nchammas/flintrock), my interest in this PR stems from the hope that we will be able to get builds of Spark against

[GitHub] spark issue #16151: [SPARK-18719] Add spark.ui.showConsoleProgress to config...

2016-12-05 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/16151 @davies - Should this also be cherry-picked into 2.0 and 2.1? I think this config has been there for a while, just without documentation. 😊 --- If your project is set up for it, you

[GitHub] spark issue #16151: [SPARK-18719] Add spark.ui.showConsoleProgress to config...

2016-12-05 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/16151 @srowen - OK, I elaborated a bit based on the snippet you posted. Feel free to nitpick on the wording. Would be happy to tweak further. --- If your project is set up for it, you can reply

[GitHub] spark issue #16151: [SPARK-18719] Add spark.ui.showConsoleProgress to config...

2016-12-05 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/16151 @srowen - Good call. Will elaborate a bit based on what you posted. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #16151: [SPARK-18719] Add spark.ui.showConsoleProgress to config...

2016-12-05 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/16151 cc @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16151: [SPARK-18719] Add spark.ui.showConsoleProgress to...

2016-12-05 Thread nchammas
GitHub user nchammas opened a pull request: https://github.com/apache/spark/pull/16151 [SPARK-18719] Add spark.ui.showConsoleProgress to configuration docs This PR adds `spark.ui.showConsoleProgress` to the configuration docs. I tested this PR by building the docs locally

[GitHub] spark issue #16130: Update location of Spark YARN shuffle jar

2016-12-03 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/16130 cc @vanzin? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16130: Update location of Spark YARN shuffle jar

2016-12-03 Thread nchammas
GitHub user nchammas opened a pull request: https://github.com/apache/spark/pull/16130 Update location of Spark YARN shuffle jar Looking at the distributions provided on spark.apache.org, I see that the Spark YARN shuffle jar is under `yarn/` and not `lib/`. You can merge

[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...

2016-11-07 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/15659 LGTM as a first cut. The workflow that I will use during development and that I think should be supported, i.e. ```sh ./dev/make-distribution.sh --pip pip install -e ./python

[GitHub] spark pull request #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip...

2016-11-06 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r86699002 --- Diff: python/pyspark/find_spark_home.py --- @@ -0,0 +1,73 @@ +#!/usr/bin/python + +# +# Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip...

2016-11-06 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r86698782 --- Diff: python/pyspark/find_spark_home.py --- @@ -0,0 +1,73 @@ +#!/usr/bin/python + +# +# Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip...

2016-11-06 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r86699184 --- Diff: python/pyspark/find_spark_home.py --- @@ -0,0 +1,73 @@ +#!/usr/bin/python + +# +# Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip...

2016-11-06 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r86698987 --- Diff: python/pyspark/find_spark_home.py --- @@ -0,0 +1,73 @@ +#!/usr/bin/python + +# +# Licensed to the Apache Software Foundation (ASF

[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...

2016-11-06 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/15659 Dunno why the tests are failing, but it's not related to packaging. Anyway, the install recipe I [posted earlier](https://github.com/apache/spark/pull/15659#issuecomment-258693543

[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...

2016-11-06 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/15659 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...

2016-11-06 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/15659 I'll try out your install recipe, but I believe ```sh ./dev/make-distribution.sh --pip pip install -e ./python/ ``` should be a valid way of installing a development

[GitHub] spark pull request #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip...

2016-11-06 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r86692033 --- Diff: python/pyspark/find_spark_home.py --- @@ -0,0 +1,66 @@ +#!/usr/bin/python + +# +# Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip...

2016-11-06 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r86690907 --- Diff: python/pyspark/find_spark_home.py --- @@ -0,0 +1,66 @@ +#!/usr/bin/python + +# +# Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip...

2016-11-06 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r86690854 --- Diff: python/pyspark/find_spark_home.py --- @@ -0,0 +1,66 @@ +#!/usr/bin/python + +# +# Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip...

2016-11-06 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r86691246 --- Diff: python/pyspark/find_spark_home.py --- @@ -0,0 +1,66 @@ +#!/usr/bin/python + +# +# Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip...

2016-11-06 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r86690957 --- Diff: python/pyspark/find_spark_home.py --- @@ -0,0 +1,66 @@ +#!/usr/bin/python + +# +# Licensed to the Apache Software Foundation (ASF

[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...

2016-11-06 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/15659 I tested this out with Python 3 on my system with the following commands: ``` # Inside ./spark/. python3 -m venv venv source venv/bin/activate ./dev/make-distribution.sh

[GitHub] spark pull request #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip...

2016-11-05 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r86668198 --- Diff: docs/building-spark.md --- @@ -259,6 +259,14 @@ or Java 8 tests are automatically enabled when a Java 8 JDK is detected. If you have JDK

[GitHub] spark pull request #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip...

2016-11-05 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r86668059 --- Diff: python/setup.py --- @@ -0,0 +1,180 @@ +#!/usr/bin/env python + +# +# Licensed to the Apache Software Foundation (ASF) under one

[GitHub] spark pull request #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip...

2016-11-05 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r86667967 --- Diff: python/setup.py --- @@ -0,0 +1,180 @@ +#!/usr/bin/env python + +# +# Licensed to the Apache Software Foundation (ASF) under one

[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...

2016-11-04 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/15659 @rxin - Not yet, but I will test it this weekend. Yes, PyPI does have a limit, but we can request an exemption. I can help coordinate that with the PyPI admins when we get

[GitHub] spark pull request #15733: [SPARK-18138][DOCS] Document that Java 7, Python ...

2016-11-02 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15733#discussion_r86158332 --- Diff: docs/index.md --- @@ -28,8 +28,9 @@ Spark runs on Java 7+, Python 2.6+/3.4+ and R 3.1+. For the Scala API, Spark {{s uses Scala

[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...

2016-11-02 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/15659 Later today (or later this week) I will try actually using this branch to install Spark via pip and report back. ``` pip install git+https://github.com/holdenk/spark@SPARK-1267-pip

[GitHub] spark pull request #15733: [SPARK-18138][DOCS] Document that Java 7, Python ...

2016-11-02 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15733#discussion_r86141486 --- Diff: docs/building-spark.md --- @@ -13,6 +13,7 @@ redirect_from: "building-with-maven.html" The Maven-based build is the build of

[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...

2016-10-31 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/15659 We have an AppVeyor build now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #15659: [WIP][SPARK-1267][SPARK-18129] Allow PySpark to b...

2016-10-28 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r85531031 --- Diff: python/setup.py --- @@ -0,0 +1,170 @@ +#!/usr/bin/env python + +# +# Licensed to the Apache Software Foundation (ASF) under one

[GitHub] spark issue #15659: [WIP][SPARK-1267][SPARK-18129] Allow PySpark to be pip i...

2016-10-27 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/15659 From the PR description: > figure out who owns the pyspark package name on prod PyPI (is it someone with in the project or should we ask PyPI or should we choose a different n

[GitHub] spark issue #15659: [WIP][SPARK-1267][SPARK-18129] Allow PySpark to be pip i...

2016-10-27 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/15659 Thanks for the additional context @holdenk and @rgbkrk. It's important to lay it out somewhere clearly so that the non-Python developers among us (and the forgetful Python developers like me) can

[GitHub] spark pull request #15659: [WIP][SPARK-1267][SPARK-18129] Allow PySpark to b...

2016-10-27 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r85377223 --- Diff: pom.xml --- @@ -26,6 +26,7 @@ org.apache.spark spark-parent_2.11 + --- End diff -- Not a sticking point

[GitHub] spark pull request #15659: [WIP][SPARK-1267][SPARK-18129] Allow PySpark to b...

2016-10-27 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r85364365 --- Diff: pom.xml --- @@ -26,6 +26,7 @@ org.apache.spark spark-parent_2.11 + --- End diff -- Something along

[GitHub] spark pull request #15659: [WIP][SPARK-1267][SPARK-18129] Allow PySpark to b...

2016-10-27 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r85365186 --- Diff: python/README.md --- @@ -0,0 +1,32 @@ +# Apache Spark + +Spark is a fast and general cluster computing system for Big Data

[GitHub] spark pull request #15659: [WIP][SPARK-1267][SPARK-18129] Allow PySpark to b...

2016-10-27 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r85364778 --- Diff: python/README.md --- @@ -0,0 +1,32 @@ +# Apache Spark + +Spark is a fast and general cluster computing system for Big Data

[GitHub] spark pull request #15659: [WIP][SPARK-1267][SPARK-18129] Allow PySpark to b...

2016-10-27 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r85355701 --- Diff: python/setup.py --- @@ -0,0 +1,169 @@ +#!/usr/bin/env python + +# +# Licensed to the Apache Software Foundation (ASF) under one

[GitHub] spark pull request #15659: [WIP][SPARK-1267][SPARK-18129] Allow PySpark to b...

2016-10-27 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r85352748 --- Diff: pom.xml --- @@ -26,6 +26,7 @@ org.apache.spark spark-parent_2.11 + --- End diff -- Would it be overkill

[GitHub] spark pull request #15659: [WIP][SPARK-1267][SPARK-18129] Allow PySpark to b...

2016-10-27 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r85354868 --- Diff: python/README.md --- @@ -0,0 +1,32 @@ +# Apache Spark + +Spark is a fast and general cluster computing system for Big Data

[GitHub] spark pull request #15659: [WIP][SPARK-1267][SPARK-18129] Allow PySpark to b...

2016-10-27 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r85355211 --- Diff: python/pyspark/find_spark_home.py --- @@ -0,0 +1,65 @@ +#!/usr/bin/python + +# +# Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15659: [WIP][SPARK-1267][SPARK-18129] Allow PySpark to b...

2016-10-27 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r85355847 --- Diff: python/setup.cfg --- @@ -0,0 +1,22 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] spark pull request #15659: [WIP][SPARK-1267][SPARK-18129] Allow PySpark to b...

2016-10-27 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r85350993 --- Diff: bin/spark-class --- @@ -36,7 +36,7 @@ else fi # Find Spark jars. -if [ -f "${SPARK_HOME}/RELEASE" ]; then

[GitHub] spark pull request #15659: [WIP][SPARK-1267][SPARK-18129] Allow PySpark to b...

2016-10-27 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r85353057 --- Diff: python/MANIFEST.in --- @@ -0,0 +1,23 @@ +#!/usr/bin/env python + +# +# Licensed to the Apache Software Foundation (ASF) under one

[GitHub] spark pull request #15659: [WIP][SPARK-1267][SPARK-18129] Allow PySpark to b...

2016-10-27 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r85353699 --- Diff: python/README.md --- @@ -0,0 +1,32 @@ +# Apache Spark + +Spark is a fast and general cluster computing system for Big Data

[GitHub] spark pull request #15659: [WIP][SPARK-1267][SPARK-18129] Allow PySpark to b...

2016-10-27 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r85351820 --- Diff: dev/create-release/release-build.sh --- @@ -162,14 +162,35 @@ if [[ "$1" == "package" ]]; then export ZINC_PORT=$ZIN

[GitHub] spark issue #15567: [SPARK-14393][SQL] values generated by non-deterministic...

2016-10-21 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/15567 @mengxr - I think this PR will also address [SPARK-14241](https://issues.apache.org/jira/browse/SPARK-14241). --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2016-10-17 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/12004 @steveloughran - Is this message in the most recent build log critical? ``` Spark's published dependencies DO NOT MATCH the manifest file (dev/spark-deps). To update the manifest

[GitHub] spark pull request #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to ru...

2016-10-13 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15338#discussion_r83349121 --- Diff: sbin/spark-daemon.sh --- @@ -146,13 +176,11 @@ run_command() { case "$mode" in (class) - noh

[GitHub] spark pull request #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to ru...

2016-10-13 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15338#discussion_r83349154 --- Diff: sbin/spark-daemon.sh --- @@ -122,6 +123,35 @@ if [ "$SPARK_NICENESS" = "" ]; then export SPAR

[GitHub] spark pull request #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to ru...

2016-10-13 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/15338#discussion_r83349053 --- Diff: sbin/spark-daemon.sh --- @@ -146,13 +176,11 @@ run_command() { case "$mode" in (class) - noh

[GitHub] spark issue #14579: [SPARK-16921][PYSPARK] RDD/DataFrame persist()/cache() s...

2016-08-25 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/14579 Looks good to me. 👍 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #14579: [SPARK-16921][PYSPARK] RDD/DataFrame persist()/cache() s...

2016-08-11 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/14579 Thanks for the quick overview. That's pretty straightforward, actually! I'll take a look at `PipelinedRDD` for the details. 👍 --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #14579: [SPARK-16921][PYSPARK] RDD/DataFrame persist()/cache() s...

2016-08-11 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/14579 Hmm, OK I see. (Apologies, I don't understand what pipelined RDDs are for, so the examples are going a bit over my head. 😅) --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #14579: [SPARK-16921][PYSPARK] RDD/DataFrame persist()/cache() s...

2016-08-11 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/14579 > So there is no chaining requirement, and it will only work in a with statement. @MLnick - Couldn't we also create a scenario (like @holdenk did earlier) where a user does something l

[GitHub] spark issue #14579: [SPARK-16921][PYSPARK] RDD/DataFrame persist()/cache() s...

2016-08-10 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/14579 Ah, I see. I don't fully understand how `PipelinedRDD` works or how it is used so I'll have to defer to y'all on this. Does the `cached()` utility method have this same problem? >

[GitHub] spark pull request #14579: [SPARK-16921][PYSPARK] RDD/DataFrame persist()/ca...

2016-08-10 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/14579#discussion_r74307747 --- Diff: python/pyspark/rdd.py --- @@ -221,6 +227,21 @@ def context(self): def cache(self): """ P

[GitHub] spark issue #14579: [SPARK-16921][PYSPARK] RDD/DataFrame persist()/cache() s...

2016-08-10 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/14579 Sorry, you're right, `__exit__()`'s return value is not going to be consumed anywhere. What I meant is that `unpersist()` would return the base RDD or DataFrame object. But I'm not seeing

[GitHub] spark issue #14579: [SPARK-16921][PYSPARK] RDD/DataFrame persist()/cache() s...

2016-08-10 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/14579 > the subclassing of RDD approach could cause us to miss out on pipelining if the RDD was used again after it was unpersisted How so? Wouldn't `__exit__()` simply return the parent

[GitHub] spark issue #14579: [SPARK-16921][PYSPARK] RDD/DataFrame persist()/cache() s...

2016-08-10 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/14579 None of our options seems great, but if I had to rank them I would say: 1. Add new `Persisted...` classes. 2. Make no changes. 3. Add separate `persisted()` or `cached()` utility

[GitHub] spark issue #14579: [SPARK-16921][PYSPARK] RDD/DataFrame persist()/cache() s...

2016-08-10 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/14579 Ah, you're right. So if we want to avoid needing magic methods in the main RDD/DataFrame classes and avoid needing a separate utility method like `cache()`, I think one option available

[GitHub] spark issue #14579: [SPARK-16921][PYSPARK] RDD/DataFrame persist()/cache() s...

2016-08-10 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/14579 Thanks @MLnick for taking this on and for breaking down what you've found so far. I took a look through [`contextlib`](https://docs.python.org/3/library/contextlib.html) for inspiration

[GitHub] spark issue #14496: [SPARK-16772] [Python] [Docs] Fix API doc references to ...

2016-08-05 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/14496 Thanks @srowen. 👍 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #14496: [SPARK-16772] [Python] [Docs] Fix API doc references to ...

2016-08-04 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/14496 cc @rxin - Follow-on to #14393. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #14496: [SPARK-16772] [Python] [Docs] Fix API doc referen...

2016-08-04 Thread nchammas
GitHub user nchammas opened a pull request: https://github.com/apache/spark/pull/14496 [SPARK-16772] [Python] [Docs] Fix API doc references to UDFRegistration + Update "important classes" ## Proposed Changes * Update the list of "important classes&q

[GitHub] spark issue #14408: [SPARK-16772] Restore "datatype string" to Python API do...

2016-07-29 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/14408 cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #14393: [SPARK-16772] Correct API doc references to PySpa...

2016-07-29 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/14393#discussion_r72853914 --- Diff: python/pyspark/sql/context.py --- @@ -226,28 +226,34 @@ def createDataFrame(self, data, schema=None, samplingRatio=None): from

[GitHub] spark pull request #14408: [SPARK-16772] Restore "datatype string" to Python...

2016-07-29 Thread nchammas
GitHub user nchammas opened a pull request: https://github.com/apache/spark/pull/14408 [SPARK-16772] Restore "datatype string" to Python API docstrings ## What changes were proposed in this pull request? This PR corrects [an error made in an earlier PR](https://

[GitHub] spark pull request #14393: [SPARK-16772] Correct API doc references to PySpa...

2016-07-29 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/14393#discussion_r72843069 --- Diff: python/pyspark/sql/context.py --- @@ -226,28 +226,34 @@ def createDataFrame(self, data, schema=None, samplingRatio=None): from

[GitHub] spark issue #14393: [SPARK-16772] Correct API doc references to PySpark clas...

2016-07-28 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/14393 Yes, I built the docs and reviewed several (but not all) of the changes locally in my browser and confirmed that the corrections I wanted took place as expected. (Apologies about

[GitHub] spark issue #14393: [SPARK-16772] Correct API doc references to PySpark clas...

2016-07-28 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/14393 Apologies for making a fairly "noisy" PR, with changes in several scattered places. However, as a PySpark user it's important to me that the API docs be properly formatted and that docst

[GitHub] spark pull request #14393: [SPARK-16772] Correct references to DataType + ot...

2016-07-28 Thread nchammas
GitHub user nchammas opened a pull request: https://github.com/apache/spark/pull/14393 [SPARK-16772] Correct references to DataType + other minor tweaks You can merge this pull request into a Git repository by running: $ git pull https://github.com/nchammas/spark python

[GitHub] spark issue #13114: Branch 1.4

2016-06-20 Thread nchammas
Github user nchammas commented on the issue: https://github.com/apache/spark/pull/13114 @srowen @vanzin - Shouldn't some automated process be picking up your comments ("close this PR") and closing this PR? I thought we had something like that. --- If your project

[GitHub] spark pull request: [SPARK-15542][SparkR] Make error message clear...

2016-05-26 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/13308#discussion_r64774474 --- Diff: R/install-dev.sh --- @@ -38,7 +38,12 @@ pushd $FWDIR > /dev/null if [ ! -z "$R_HOME" ] then R_SCRIPT_PATH

[GitHub] spark pull request: [SPARK-15072][SQL][PYSPARK][HOT-FIX] Remove Sp...

2016-05-16 Thread nchammas
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/13069#issuecomment-219517952 Okie doke, thanks for the explanation! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

  1   2   3   4   5   6   7   8   9   10   >