[GitHub] spark pull request: SPARK-1972: Added support for tracking custom ...

2014-05-30 Thread kalpit
GitHub user kalpit opened a pull request:

https://github.com/apache/spark/pull/918

SPARK-1972: Added support for tracking custom task-related metrics

Any piece of Spark machinery that cares to track custom task-related 
metrics can now use the setCustomMetric(name,value) method on TaskMetrics to do 
that.
The StagePage in UI now shows Custom Metrics for every task.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kalpit/spark topic/customTaskMetrics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/918.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #918


commit 0d96d95a08fc87592827e4e05222ec89f6c832fb
Author: Kalpit Shah shahkalpi...@gmail.com
Date:   2014-05-30T09:04:49Z

SPARK-1972: Added support for tracking custom task-related metrics

Any piece of Spark machinery that cares to track custom task-related 
metrics can now use the setCustomMetric(name,value) method on TaskMetrics to do 
that.
The StagePage in UI now shows Custom Metrics for every task.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: sbt 0.13.X should be using sbt-assembly 0.11.X

2014-04-30 Thread kalpit
Github user kalpit commented on the pull request:

https://github.com/apache/spark/pull/555#issuecomment-41823002
  
@pwendell @srowen Can you reply to my previous comment when you get a 
chance ? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: sbt 0.13.X should be using sbt-assembly 0.11.X

2014-04-30 Thread kalpit
Github user kalpit commented on the pull request:

https://github.com/apache/spark/pull/555#issuecomment-41827145
  
Ok. I won't monitor the PR then. You can merge it when the time is right. 
Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1630: Make PythonRDD handle NULL element...

2014-04-28 Thread kalpit
Github user kalpit commented on the pull request:

https://github.com/apache/spark/pull/554#issuecomment-41595891
  
I see your point. I don't have a Python-only use-case that can trigger the 
NPE.

My custom RDD implementation had a corner-case in which RDD's compute() 
method returned a null in the iterator stream. I have fixed my custom RDD 
implementation to not do that, so I don't run into this NPE anymore. However, 
should anyone else out there ever implement a custom RDD of similar nature (has 
nulls for some elements in a partition's iterator stream) and tries accessing 
such an RDD from PySpark, he/she would run into the NPE, so I thought it would 
be nicer if we handled nulls in the stream gracefully.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: sbt 0.13.X should be using sbt-assembly 0.11.X

2014-04-28 Thread kalpit
Github user kalpit commented on the pull request:

https://github.com/apache/spark/pull/555#issuecomment-41605429
  
My understanding was that a successful run of automated jenkins test would 
remove the travis CI build error from the PR. Looks like that is not the 
case. How is the Travis CI build different from the automated Jenkins test 
run ? How do I go about fixing/getting-rid of the Travis CI build error from 
the PR so that it can be merged if/when we decide to merge it ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1630: Make PythonRDD handle NULL element...

2014-04-26 Thread kalpit
Github user kalpit commented on the pull request:

https://github.com/apache/spark/pull/554#issuecomment-41487520
  
I suspect that the NPEs will happen for any PySpark User who has an RDD 
that returns null for some input x based on the lambda/transform. Check out 
the test case I added to PythonRDDSuite.scala to reproduce the NPE.

I considered the idea of using negative length (-4) to pass None to 
python (PythonRDD.SpecialLengths -1 to -3 are taken). The tricky part however 
is that the read() method returns an array of bytes based on the length. 
Existing code treats empty array as end of data/stream. So I am not sure how we 
would communicate None to python. Thoughts ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: sbt 0.13.X should be using sbt-assembly 0.11.X

2014-04-26 Thread kalpit
Github user kalpit commented on the pull request:

https://github.com/apache/spark/pull/555#issuecomment-41487572
  
I don't see how the build failure is related to the commit. How can I get 
the build re-triggered ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1630: Make PythonRDD handle NULL element...

2014-04-25 Thread kalpit
GitHub user kalpit opened a pull request:

https://github.com/apache/spark/pull/554

SPARK-1630: Make PythonRDD handle NULL elements and strings gracefully

Have added a unit test that validates the fix. We no longer NPE.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kalpit/spark pyspark/handleNullData

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/554.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #554


commit ff036d31c6adbc2cd5f2c9347c267073b673167b
Author: Kalpit Shah shahkalpi...@gmail.com
Date:   2014-04-25T17:44:30Z

SPARK-1630: Make PythonRDD handle Null elements and strings gracefully




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: sbt 0.13.X should be using sbt-assembly 0.11.X

2014-04-25 Thread kalpit
GitHub user kalpit opened a pull request:

https://github.com/apache/spark/pull/555

sbt 0.13.X should be using sbt-assembly 0.11.X

https://github.com/sbt/sbt-assembly/blob/master/README.md

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kalpit/spark upgrade/sbtassembly

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/555.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #555


commit 1fa732468addfae771717a0bbb6084b5550d4c3c
Author: Kalpit Shah shahkalpi...@gmail.com
Date:   2014-04-25T18:53:15Z

sbt 0.13.X should be using sbt-assembly 0.11.X




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1630: Make PythonRDD handle NULL element...

2014-04-25 Thread kalpit
Github user kalpit commented on a diff in the pull request:

https://github.com/apache/spark/pull/554#discussion_r12010821
  
--- Diff: 
core/src/test/scala/org/apache/spark/api/python/PythonRDDSuite.scala ---
@@ -29,5 +29,10 @@ class PythonRDDSuite extends FunSuite {
 PythonRDD.writeIteratorToStream(input.iterator, buffer)
 }
 
+test(Handle nulls gracefully) {
+val input: List[String] = List(a, null)
--- End diff --

I used 4-space indent since other test in this suite had that. I will 
re-indent to 2-space.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---