[GitHub] spark pull request: SPARK-1972: Added support for tracking custom ...
GitHub user kalpit opened a pull request: https://github.com/apache/spark/pull/918 SPARK-1972: Added support for tracking custom task-related metrics Any piece of Spark machinery that cares to track custom task-related metrics can now use the setCustomMetric(name,value) method on TaskMetrics to do that. The StagePage in UI now shows Custom Metrics for every task. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kalpit/spark topic/customTaskMetrics Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/918.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #918 commit 0d96d95a08fc87592827e4e05222ec89f6c832fb Author: Kalpit Shah shahkalpi...@gmail.com Date: 2014-05-30T09:04:49Z SPARK-1972: Added support for tracking custom task-related metrics Any piece of Spark machinery that cares to track custom task-related metrics can now use the setCustomMetric(name,value) method on TaskMetrics to do that. The StagePage in UI now shows Custom Metrics for every task. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: sbt 0.13.X should be using sbt-assembly 0.11.X
Github user kalpit commented on the pull request: https://github.com/apache/spark/pull/555#issuecomment-41823002 @pwendell @srowen Can you reply to my previous comment when you get a chance ? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: sbt 0.13.X should be using sbt-assembly 0.11.X
Github user kalpit commented on the pull request: https://github.com/apache/spark/pull/555#issuecomment-41827145 Ok. I won't monitor the PR then. You can merge it when the time is right. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1630: Make PythonRDD handle NULL element...
Github user kalpit commented on the pull request: https://github.com/apache/spark/pull/554#issuecomment-41595891 I see your point. I don't have a Python-only use-case that can trigger the NPE. My custom RDD implementation had a corner-case in which RDD's compute() method returned a null in the iterator stream. I have fixed my custom RDD implementation to not do that, so I don't run into this NPE anymore. However, should anyone else out there ever implement a custom RDD of similar nature (has nulls for some elements in a partition's iterator stream) and tries accessing such an RDD from PySpark, he/she would run into the NPE, so I thought it would be nicer if we handled nulls in the stream gracefully. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: sbt 0.13.X should be using sbt-assembly 0.11.X
Github user kalpit commented on the pull request: https://github.com/apache/spark/pull/555#issuecomment-41605429 My understanding was that a successful run of automated jenkins test would remove the travis CI build error from the PR. Looks like that is not the case. How is the Travis CI build different from the automated Jenkins test run ? How do I go about fixing/getting-rid of the Travis CI build error from the PR so that it can be merged if/when we decide to merge it ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1630: Make PythonRDD handle NULL element...
Github user kalpit commented on the pull request: https://github.com/apache/spark/pull/554#issuecomment-41487520 I suspect that the NPEs will happen for any PySpark User who has an RDD that returns null for some input x based on the lambda/transform. Check out the test case I added to PythonRDDSuite.scala to reproduce the NPE. I considered the idea of using negative length (-4) to pass None to python (PythonRDD.SpecialLengths -1 to -3 are taken). The tricky part however is that the read() method returns an array of bytes based on the length. Existing code treats empty array as end of data/stream. So I am not sure how we would communicate None to python. Thoughts ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: sbt 0.13.X should be using sbt-assembly 0.11.X
Github user kalpit commented on the pull request: https://github.com/apache/spark/pull/555#issuecomment-41487572 I don't see how the build failure is related to the commit. How can I get the build re-triggered ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1630: Make PythonRDD handle NULL element...
GitHub user kalpit opened a pull request: https://github.com/apache/spark/pull/554 SPARK-1630: Make PythonRDD handle NULL elements and strings gracefully Have added a unit test that validates the fix. We no longer NPE. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kalpit/spark pyspark/handleNullData Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/554.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #554 commit ff036d31c6adbc2cd5f2c9347c267073b673167b Author: Kalpit Shah shahkalpi...@gmail.com Date: 2014-04-25T17:44:30Z SPARK-1630: Make PythonRDD handle Null elements and strings gracefully --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: sbt 0.13.X should be using sbt-assembly 0.11.X
GitHub user kalpit opened a pull request: https://github.com/apache/spark/pull/555 sbt 0.13.X should be using sbt-assembly 0.11.X https://github.com/sbt/sbt-assembly/blob/master/README.md You can merge this pull request into a Git repository by running: $ git pull https://github.com/kalpit/spark upgrade/sbtassembly Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/555.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #555 commit 1fa732468addfae771717a0bbb6084b5550d4c3c Author: Kalpit Shah shahkalpi...@gmail.com Date: 2014-04-25T18:53:15Z sbt 0.13.X should be using sbt-assembly 0.11.X --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1630: Make PythonRDD handle NULL element...
Github user kalpit commented on a diff in the pull request: https://github.com/apache/spark/pull/554#discussion_r12010821 --- Diff: core/src/test/scala/org/apache/spark/api/python/PythonRDDSuite.scala --- @@ -29,5 +29,10 @@ class PythonRDDSuite extends FunSuite { PythonRDD.writeIteratorToStream(input.iterator, buffer) } +test(Handle nulls gracefully) { +val input: List[String] = List(a, null) --- End diff -- I used 4-space indent since other test in this suite had that. I will re-indent to 2-space. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---