[GitHub] spark pull request: [Minor] [Doc] [ML] ml.clustering scala & pytho...

2016-05-25 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13291#discussion_r64699243
  
--- Diff: python/pyspark/ml/clustering.py ---
@@ -64,6 +64,21 @@ class GaussianMixture(JavaEstimator, HasFeaturesCol, 
HasPredictionCol, HasMaxIte
 .. note:: Experimental
 
 GaussianMixture clustering.
+This class performs expectation maximization for multivariate Gaussian
+Mixture Models (GMMs).  A GMM represents a composite distribution of
+independent Gaussian distributions with associated "mixing" weights
+specifying each's contribution to the composite.
+
+Given a set of sample points, this class will maximize the 
log-likelihood
+for a mixture of k Gaussians, iterating until the log-likelihood 
changes by
+less than convergenceTol, or until it has reached the max number of 
iterations.
+While this process is generally guaranteed to converge, it is not 
guaranteed
+to find a global optimum.
+
+Note: For high-dimensional data (with many features), this algorithm 
may perform poorly.
--- End diff --

super minor: This formats oddly in Sphinx - to match the scala doc format 
wise I think you could drop the indentation for the sentances under note, or if 
you wanted to do a PyDoc note call out you could use the `.. Note::` syntax


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor] [Doc] [ML] ml.clustering scala & pytho...

2016-05-25 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13291#discussion_r64699018
  
--- Diff: python/pyspark/ml/clustering.py ---
@@ -227,15 +242,15 @@ class KMeans(JavaEstimator, HasFeaturesCol, 
HasPredictionCol, HasMaxIter, HasTol
 .. versionadded:: 1.5.0
 """
 
-k = Param(Params._dummy(), "k", "number of clusters to create",
+k = Param(Params._dummy(), "k", "The number of clusters to create. 
Must be > 1.",
   typeConverter=TypeConverters.toInt)
 initMode = Param(Params._dummy(), "initMode",
- "the initialization algorithm. This can be either 
\"random\" to " +
+ "The initialization algorithm. This can be either 
\"random\" to " +
  "choose random points as initial cluster centers, or 
\"k-means||\" " +
  "to use a parallel variant of k-means++",
  typeConverter=TypeConverters.toString)
-initSteps = Param(Params._dummy(), "initSteps", "steps for k-means 
initialization mode",
-  typeConverter=TypeConverters.toInt)
+initSteps = Param(Params._dummy(), "initSteps", "The number of steps 
for k-means|| " +
--- End diff --

Since were copying this over might as well also include "his is an advanced 
setting -- the default of 5 is almost always enough." from the scala side?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15543][SQL] Rename DefaultSources to ma...

2016-05-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13311


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15543][SQL] Rename DefaultSources to ma...

2016-05-25 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13311#issuecomment-221792095
  
Merging in master/2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR] Fix Typos

2016-05-25 Thread zhengruifeng
Github user zhengruifeng commented on the pull request:

https://github.com/apache/spark/pull/13317#issuecomment-221792040
  
@holdenk Thanks. I think you are right. I will revert `an one-xxx` to `a 
one-xxx`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15391] [SQL] manage the temporary memor...

2016-05-25 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13318#issuecomment-221791831
  
FYI "This PR also change the loadFactor of BytesToBytesMap to 0.5 (it was 
0.75)" this is a pretty low load factor.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15457][MLLIB][ML] Eliminate some warnin...

2016-05-25 Thread holdenk
Github user holdenk commented on the pull request:

https://github.com/apache/spark/pull/13314#issuecomment-221791580
  
@srowen willing to help with that too btw :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15391] [SQL] manage the temporary memor...

2016-05-25 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13318#issuecomment-221791386
  
Can we add a unit test for this behavior?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP] [SPARK-8426] Enhance Blacklist mechanism...

2016-05-25 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/13234#discussion_r64698507
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -249,10 +249,16 @@ private[spark] class TaskSchedulerImpl(
   availableCpus: Array[Int],
   tasks: Seq[ArrayBuffer[TaskDescription]]) : Boolean = {
 var launchedTask = false
+// TODO unit test, and also add executor-stage filtering as well
+// This is an optimization -- the taskSet might contain a very long 
list of pending tasks.
+// Rather than wasting time checking the offer against each task, and 
then realizing the
+// executor is blacklisted, just filter out the bad executor 
immediately.
+val nodeBlacklist = 
taskSet.blacklistTracker.map{_.nodeBlacklistForStage(taskSet.stageId)}
+  .getOrElse(Set())
--- End diff --

Before this change, there is an `O(n^2)` (where `n` is the number of 
pending tasks) cost when you've got one bad executor.  The tasks assigned to 
the bad executor fail, but then we get another resource offer for the bad 
executor again.  So we find another task for the bad executor, it fails, and we 
continue the process, going through all of the pending task.  Each time we 
respond to the resource offer, we need to (a) iterate through the list of tasks 
to find one that is *not* blacklisted and (b) then remove it from the task 
list.  Those are both `O(1)` operations when there isn't any blacklisting -- we 
just pop the last task off the stack.  But as our bad executor makes its way 
through the tasks, it has to go deeper into the list each time, and both 
searching the list and then removing an element from it become expensive.

After we've gone through *all* of the tasks for bad executor once, then we 
will wait for there to be resource offers from good executors.  However, even 
though we then start scheduling on the good executor, scheduling as a whole is 
still much slower, because we still have an `O(n)` cost at each call to 
resourceOffer.  The offer still includes the (now idle) bad executor, and we 
have to iterate through the entire list of pending tasks to decide that nope, 
there aren't any tasks we can schedule on that node.

In my performance tests with a 3k task job, this leads to about a 10x 
slowdown, but obviously this depends a lot on the number of tasks.  But that is 
the really scary thing -- its not a function of how many bad nodes you have, 
but how many tasks you are trying to run.  So on a large cluster, where a bad 
node is more likely, and lots of tasks are more likely, the slowdown will be 
much worse.

Note that as implemented in this version of the patch, this slowdown is 
only avoided when we blacklist the entire node.  But we should add blacklisting 
for an executor as well, to avoid the slowdown in that case also.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR] Fix Typos

2016-05-25 Thread holdenk
Github user holdenk commented on the pull request:

https://github.com/apache/spark/pull/13317#issuecomment-221791122
  
Also, your change seems to have made a few odd changes "an one way" which 
sounds odd, generally "a one way" is considered sounding "better" (I'm a bit 
fuzzy on the exact rule - but if you look you'll see people say "a one way 
ticket" instead of "an one way ticket" and some other similar things).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15543][SQL] Rename DefaultSources to ma...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13311#issuecomment-221791056
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR] Fix Typos

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13317#issuecomment-221791117
  
**[Test build #59352 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59352/consoleFull)**
 for PR 13317 at commit 
[`230c801`](https://github.com/apache/spark/commit/230c80148cdcd29242fa8fb828ca12ec8c402221).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15543][SQL] Rename DefaultSources to ma...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13311#issuecomment-221791057
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59343/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and exa...

2016-05-25 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/13176#discussion_r64698276
  
--- Diff: docs/ml-features.md ---
@@ -145,9 +148,11 @@ for more details on the API.
  passed to other algorithms like LDA.
 
  During the fitting process, `CountVectorizer` will select the top 
`vocabSize` words ordered by
- term frequency across the corpus. An optional parameter "minDF" also 
affects the fitting process
+ term frequency across the corpus. An optional parameter `minDF` also 
affects the fitting process
  by specifying the minimum number (or fraction if < 1.0) of documents a 
term must appear in to be
- included in the vocabulary.
+ included in the vocabulary. Another optional binary toggle parameter 
controls the output vector.
--- End diff --

You haven't addressed my previous comment for this part both here and in 
`HashingTF`:

Let's make this consistent with the doc for HashingTF above.

I'd prefer both to read:

"... optional parameter binary controls the output term frequencies. When 
set to true, all nonzero term frequencies are set to 1. This is especially 
useful for discrete probabilistic models that model binary, rather than 
integer, counts."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15543][SQL] Rename DefaultSources to ma...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13311#issuecomment-221790919
  
**[Test build #59343 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59343/consoleFull)**
 for PR 13311 at commit 
[`94d6e7b`](https://github.com/apache/spark/commit/94d6e7b218e0a969b41f32bd61878cf890c3ba99).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR] Fix Typos

2016-05-25 Thread zhengruifeng
Github user zhengruifeng commented on the pull request:

https://github.com/apache/spark/pull/13317#issuecomment-221790963
  
@holdenk Thanks. I have fixed this. and run `lint-java` to check java file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15552][SQL] Remove unnecessary private[...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13319#issuecomment-221790817
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59349/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15552][SQL] Remove unnecessary private[...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13319#issuecomment-221790798
  
**[Test build #59349 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59349/consoleFull)**
 for PR 13319 at commit 
[`8d1958e`](https://github.com/apache/spark/commit/8d1958e6e0ded35fa29282aa35da548a059f15fe).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and exa...

2016-05-25 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/13176#discussion_r64698136
  
--- Diff: docs/ml-features.md ---
@@ -53,7 +53,10 @@ collisions, where different raw features may become the 
same term after hashing.
 chance of collision, we can increase the target feature dimension, i.e. 
the number of buckets 
 of the hash table. Since a simple modulo is used to transform the hash 
function to a column index, 
--- End diff --

I think we can add it - but we can simply say "The hash function used is 
MurmurHash 3"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15552][SQL] Remove unnecessary private[...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13319#issuecomment-221790816
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR] Fix Typos

2016-05-25 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13317#discussion_r64698017
  
--- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ---
@@ -105,7 +105,7 @@ private[spark] abstract class MapOutputTracker(conf: 
SparkConf) extends Logging
 }
   }
 
-  /** Send a one-way message to the trackerEndpoint, to which we expect it 
to reply with true. */
+  /** Send an one-way message to the trackerEndpoint, to which we expect 
it to reply with true. */
--- End diff --

I don't think this change is correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR] Fix Typos

2016-05-25 Thread holdenk
Github user holdenk commented on the pull request:

https://github.com/apache/spark/pull/13317#issuecomment-221790462
  
So it seems that in a few places adding the extra character has pushed it 
over the 100. You should probably run the linter explicitly if you have it 
disabled by default `./dev/lint-scala`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15551][MINOR][DOCS][SQL] Replace groupB...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13316#issuecomment-221790073
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59348/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15551][MINOR][DOCS][SQL] Replace groupB...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13316#issuecomment-221790065
  
**[Test build #59348 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59348/consoleFull)**
 for PR 13316 at commit 
[`325a2ea`](https://github.com/apache/spark/commit/325a2ea5fb9de05f866aa4eab56dea5563223712).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15551][MINOR][DOCS][SQL] Replace groupB...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13316#issuecomment-221790072
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR] Fix Typos

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13317#issuecomment-221789832
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR] Fix Typos

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13317#issuecomment-221789830
  
**[Test build #59351 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59351/consoleFull)**
 for PR 13317 at commit 
[`cff3aa8`](https://github.com/apache/spark/commit/cff3aa81f2417ff5bc0d1e7bf205ed2ff5a8eb7f).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15552][SQL] Remove unnecessary private[...

2016-05-25 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13319#issuecomment-221789707
  
cc @cloud-fan @andrewor14 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR] Fix Typos

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13317#issuecomment-221789833
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59351/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15552][SQL] Remove unnecessary private[...

2016-05-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13319#discussion_r64697510
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -178,12 +180,14 @@ class SparkSession private(
   def udf: UDFRegistration = sessionState.udf
 
   /**
+   * :: Experimental ::
* Returns a [[ContinuousQueryManager]] that allows managing all the
* [[org.apache.spark.sql.ContinuousQuery ContinuousQueries]] active on 
`this`.
*
* @group basic
* @since 2.0.0
*/
+  @Experimental
--- End diff --

this is a "bug" fix


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15391] [SQL] manage the temporary memor...

2016-05-25 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/13318#issuecomment-221789570
  
cc @ericl 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15391] [SQL] manage the temporary memor...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13318#issuecomment-221789635
  
**[Test build #59350 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59350/consoleFull)**
 for PR 13318 at commit 
[`6d074f6`](https://github.com/apache/spark/commit/6d074f6e3ad41f427e6dcb9f5a72674798a40b5e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15552][SQL] Remove unnecessary private[...

2016-05-25 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/13319

[SPARK-15552][SQL] Remove unnecessary private[sql] methods in SparkSession

## What changes were proposed in this pull request?
SparkSession has a list of unnecessary private[sql] methods. These methods 
cause some trouble because private[sql] doesn't apply in Java. In the cases 
that they are easy to remove, we can simply remove them. This patch does that.

## How was this patch tested?
Updated test cases to reflect the changes.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-15552

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13319.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13319


commit 8d1958e6e0ded35fa29282aa35da548a059f15fe
Author: Reynold Xin 
Date:   2016-05-26T06:36:05Z

[SPARK-15552][SQL] Remove unnecessary private[sql] methods in SparkSession




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15552][SQL] Remove unnecessary private[...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13319#issuecomment-221789649
  
**[Test build #59349 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59349/consoleFull)**
 for PR 13319 at commit 
[`8d1958e`](https://github.com/apache/spark/commit/8d1958e6e0ded35fa29282aa35da548a059f15fe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR] Fix Typos

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13317#issuecomment-221789634
  
**[Test build #59351 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59351/consoleFull)**
 for PR 13317 at commit 
[`cff3aa8`](https://github.com/apache/spark/commit/cff3aa81f2417ff5bc0d1e7bf205ed2ff5a8eb7f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15391] [SQL] manage the temporary memor...

2016-05-25 Thread davies
GitHub user davies opened a pull request:

https://github.com/apache/spark/pull/13318

[SPARK-15391] [SQL] manage the temporary memory of timsort

## What changes were proposed in this pull request?

Currently, the memory for temporary buffer used by TimSort is always 
allocated as on-heap without bookkeeping, it could cause OOM both in on-heap 
and off-heap mode.

This PR will try to manage that by preallocate it together with the pointer 
array, same with RadixSort. It both works for on-heap and off-heap mode.

This PR also change the loadFactor of BytesToBytesMap to 0.5 (it was 0.75), 
it enables use to radix sort also makes sure that we have enough memory for 
timsort. 

## How was this patch tested?

Existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davies/spark fix_timsort

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13318.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13318


commit 6d074f6e3ad41f427e6dcb9f5a72674798a40b5e
Author: Davies Liu 
Date:   2016-05-26T06:29:09Z

manage the temporary memory of timsort




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR] Fix Typos

2016-05-25 Thread zhengruifeng
GitHub user zhengruifeng opened a pull request:

https://github.com/apache/spark/pull/13317

[MINOR] Fix Typos

## What changes were proposed in this pull request?

`a` -> `an`


## How was this patch tested?

local build





You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhengruifeng/spark a_an

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13317.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13317


commit cff3aa81f2417ff5bc0d1e7bf205ed2ff5a8eb7f
Author: Zheng RuiFeng 
Date:   2016-05-26T06:29:10Z

create pr




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15434][SQL] improve EmbedSerializerInFi...

2016-05-25 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13216#discussion_r64697298
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/TypedFilterOptimizationSuite.scala
 ---
@@ -34,40 +35,47 @@ class TypedFilterOptimizationSuite extends PlanTest {
   Batch("EliminateSerialization", FixedPoint(50),
 EliminateSerialization) ::
   Batch("EmbedSerializerInFilter", FixedPoint(50),
-EmbedSerializerInFilter) :: Nil
+EmbedSerializerInFilter,
+RemoveAliasOnlyProject,
+CombineFilters) :: Nil
   }
 
   implicit private def productEncoder[T <: Product : TypeTag] = 
ExpressionEncoder[T]()
 
-  test("back to back filter") {
+  test("embed deserializer in filter condition if there is only one 
filter") {
 val input = LocalRelation('_1.int, '_2.int)
-val f1 = (i: (Int, Int)) => i._1 > 0
-val f2 = (i: (Int, Int)) => i._2 > 0
+val f = (i: (Int, Int)) => i._1 > 0
 
-val query = input.filter(f1).filter(f2).analyze
+val query = input.filter(f).analyze
 
 val optimized = Optimize.execute(query)
 
-val expected = input.deserialize[(Int, Int)]
-  .where(callFunction(f1, BooleanType, 'obj))
-  .select('obj.as("obj"))
-  .where(callFunction(f2, BooleanType, 'obj))
-  .serialize[(Int, Int)].analyze
+val deserializer = input.deserialize[(Int, Int)].analyze
+  .asInstanceOf[DeserializeToObject].deserializer
+val boundReference = BoundReference(0, deserializer.dataType, nullable 
= false)
+val callFunc = callFunction(f, BooleanType, boundReference)
+val condition = ReferenceToExpressions(callFunc, deserializer :: Nil)
+val expected = input.where(condition).analyze
 
 comparePlans(optimized, expected)
   }
 
-  test("embed deserializer in filter condition if there is only one 
filter") {
+  test("embed deserializer in filter condition if there are two filters") {
--- End diff --

Shall we add a new test case instead of replacing the original one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15551][MINOR][DOCS][SQL] Replace groupB...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13316#issuecomment-221788985
  
**[Test build #59348 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59348/consoleFull)**
 for PR 13316 at commit 
[`325a2ea`](https://github.com/apache/spark/commit/325a2ea5fb9de05f866aa4eab56dea5563223712).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15551][MINOR][DOCS][SQL] Replace groupB...

2016-05-25 Thread holdenk
GitHub user holdenk opened a pull request:

https://github.com/apache/spark/pull/13316

[SPARK-15551][MINOR][DOCS][SQL] Replace groupBy with groupByKey in 
KeyValueGroupedDataset Scaladoc

## What changes were proposed in this pull request?

Replace groupBy with groupByKey in KeyValueGroupedDataset Scaladoc and 
update Scaladoc on dataset groupByKey to mention that it is a replacement for 
the old groupBy + keyAs.


## How was this patch tested?

Verified groupByKey behaved as groupBy + keyAs used to function against 
spark 2.0 preview and built unidoc locally.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/holdenk/spark 
minor-scaladoc-KeyValueGroupedDataset

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13316.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13316


commit 325a2ea5fb9de05f866aa4eab56dea5563223712
Author: Holden Karau 
Date:   2016-05-26T06:19:18Z

Minor: replace groupBy with groupByKey in KeyValueGroupedDataset and 
mention groupByKey replaces groupBy combined with keyAs from Spark 1.6




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15236][SQL][SPARK SHELL] Add spark-defa...

2016-05-25 Thread xwu0226
Github user xwu0226 commented on the pull request:

https://github.com/apache/spark/pull/13088#issuecomment-221788192
  
@rxin @andrewor14 @cloud-fan Please help review! Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15439][SparkR]:Failed to run unit test ...

2016-05-25 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request:

https://github.com/apache/spark/pull/13284#issuecomment-221786544
  
@shivaram  I will create a JIRA soon. Thursday and Friday, I will be on 
travel to NYC. Will do it on Saturday.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12836#issuecomment-221786353
  
**[Test build #59347 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59347/consoleFull)**
 for PR 12836 at commit 
[`9cacd4d`](https://github.com/apache/spark/commit/9cacd4dbfa0e20d2a855e23f2962a258abbba553).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15515] [SQL] Error Handling in Running ...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13283#issuecomment-221786340
  
**[Test build #59346 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59346/consoleFull)**
 for PR 13283 at commit 
[`b9e12f8`](https://github.com/apache/spark/commit/b9e12f8742e76984445f9d498248704b1c9e9973).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12836#issuecomment-221785096
  
**[Test build #59345 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59345/consoleFull)**
 for PR 12836 at commit 
[`0928740`](https://github.com/apache/spark/commit/09287408137f7d6fbe8f899b12810ab16cbb5c3e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15463][SQL] support creating dataframe ...

2016-05-25 Thread xwu0226
Github user xwu0226 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13300#discussion_r64694941
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala
 ---
@@ -142,6 +145,75 @@ object CSVRelation extends Logging {
   if (nonEmptyLines.hasNext) nonEmptyLines.drop(1)
 }
   }
+
+  def baseRdd(
+  sparkSession: SparkSession,
+  options: CSVOptions,
+  inputPaths: Seq[String]): RDD[String] = {
+readText(sparkSession, options, inputPaths.mkString(","))
+  }
+
+  def tokenRdd(
+  options: CSVOptions,
+  header: Array[String],
+  rdd: RDD[String]): RDD[Array[String]] = {
+val firstLine = if (options.headerFlag) findFirstLine(options, rdd) 
else null
+univocityTokenizer(rdd, header, firstLine, options)
+  }
+
+  /**
+   * Returns the first line of the first non-empty file in path
+   */
+  def findFirstLine(options: CSVOptions, rdd: RDD[String]): String = {
+if (options.isCommentSet) {
+  val comment = options.comment.toString
+  rdd.filter { line =>
+line.trim.nonEmpty && !line.startsWith(comment)
+  }.first()
+} else {
+  rdd.filter { line =>
+line.trim.nonEmpty
+  }.first()
+}
+  }
+
+  def readText(
+  sparkSession: SparkSession,
+  options: CSVOptions,
+  location: String): RDD[String] = {
+if (Charset.forName(options.charset) == StandardCharsets.UTF_8) {
+  sparkSession.sparkContext.textFile(location)
+} else {
+  val charset = options.charset
+  sparkSession.sparkContext
+.hadoopFile[LongWritable, Text, TextInputFormat](location)
+.mapPartitions(_.map(pair => new String(pair._2.getBytes, 0, 
pair._2.getLength, charset)))
+}
+  }
+
+  def verifySchema(schema: StructType): Unit = {
+schema.foreach { field =>
+  field.dataType match {
+case _: ArrayType | _: MapType | _: StructType =>
+  throw new UnsupportedOperationException(
+s"CSV data source does not support 
${field.dataType.simpleString} data type.")
+case _ =>
+  }
+}
+  }
+
+  def getHeader(rdd: RDD[String], csvOptions: CSVOptions): Array[String] = 
{
--- End diff --

This is also used in a few places to get the header from csv records. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8603][SPARKR] Use shell() instead of sy...

2016-05-25 Thread sun-rui
Github user sun-rui commented on a diff in the pull request:

https://github.com/apache/spark/pull/13165#discussion_r64694857
  
--- Diff: R/pkg/inst/tests/testthat/test_includeJAR.R ---
@@ -21,10 +21,13 @@ runScript <- function() {
   sparkTestJarPath <- "R/lib/SparkR/test_support/sparktestjar_2.10-1.0.jar"
   jarPath <- paste("--jars", shQuote(file.path(sparkHome, 
sparkTestJarPath)))
   scriptPath <- file.path(sparkHome, 
"R/lib/SparkR/tests/testthat/jarTest.R")
-  submitPath <- file.path(sparkHome, "bin/spark-submit")
-  res <- system2(command = submitPath,
- args = c(jarPath, scriptPath),
- stdout = TRUE)
+  if (.Platform$OS.type == "windows") {
--- End diff --

you can call determineSparkSubmitBin() here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15463][SQL] support creating dataframe ...

2016-05-25 Thread xwu0226
Github user xwu0226 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13300#discussion_r64694834
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -42,16 +42,23 @@ private[csv] object CSVInferSchema {
   tokenRdd: RDD[Array[String]],
   header: Array[String],
   options: CSVOptions): StructType = {
-val startType: Array[DataType] = 
Array.fill[DataType](header.length)(NullType)
-val rootTypes: Array[DataType] =
-  tokenRdd.aggregate(startType)(inferRowType(options), mergeRowTypes)
+val structFields = if (options.inferSchemaFlag) {
--- End diff --

This method is used in both `csv.DefaultSource` and 
`DataFrameReader.csv(ds: Dataset[String])`. So I refactored it here to take 
care both the default schema type and `inferSchemaFlag=true` cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10372] [CORE] basic test framework for ...

2016-05-25 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/8559#issuecomment-221784338
  
on a related note, @squito can you in the future leave a msg indicating the 
branch a pr was merged once you merge it? There have been cases that lead to 
race conditions in merging and also mistakes in the branches that we needed to 
go back and audit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10372] [CORE] basic test framework for ...

2016-05-25 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/8559#issuecomment-221784058
  
This is pretty cool!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP] [SPARK-8426] Enhance Blacklist mechanism...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13234#issuecomment-221783307
  
**[Test build #59344 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59344/consoleFull)**
 for PR 13234 at commit 
[`8f2534b`](https://github.com/apache/spark/commit/8f2534b1d4d90f1ed42c695a77f5a2fa588d3428).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10372] [CORE] basic test framework for ...

2016-05-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8559


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15532] [SQL] Add SQLConf.ALLOW_MULTIPLE...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13310#issuecomment-221780558
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15532] [SQL] Add SQLConf.ALLOW_MULTIPLE...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13310#issuecomment-221780560
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59333/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15532] [SQL] Add SQLConf.ALLOW_MULTIPLE...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13310#issuecomment-221780470
  
**[Test build #59333 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59333/consoleFull)**
 for PR 13310 at commit 
[`f40a898`](https://github.com/apache/spark/commit/f40a89873ba92eaf5821dce4728d2aab84e1289e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13290#issuecomment-221777604
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13290#issuecomment-221777607
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59334/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13290#issuecomment-221777507
  
**[Test build #59334 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59334/consoleFull)**
 for PR 13290 at commit 
[`127024d`](https://github.com/apache/spark/commit/127024da7e1058cd39b71e85c6dcd08b5e3e2b53).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15543][SQL] Rename DefaultSources to ma...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13311#issuecomment-221777001
  
**[Test build #59343 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59343/consoleFull)**
 for PR 13311 at commit 
[`94d6e7b`](https://github.com/apache/spark/commit/94d6e7b218e0a969b41f32bd61878cf890c3ba99).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15533][SQL]Deprecate Dataset.explode

2016-05-25 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at:

https://github.com/apache/spark/pull/13313


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15327] [SQL] fix split expression in wh...

2016-05-25 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/13235#issuecomment-221776141
  
It looks like #12351 is the same issue about whole stage codegen with 
`splitExpressions`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15515] [SQL] Error Handling in Running ...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13283#issuecomment-221775767
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59332/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15515] [SQL] Error Handling in Running ...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13283#issuecomment-221775766
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15515] [SQL] Error Handling in Running ...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13283#issuecomment-221775683
  
**[Test build #59332 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59332/consoleFull)**
 for PR 13283 at commit 
[`76f4f80`](https://github.com/apache/spark/commit/76f4f80f962e0271a2073a4cb8de0d513013cf87).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15542][SparkR] Make error message clear...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13308#issuecomment-221775528
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15542][SparkR] Make error message clear...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13308#issuecomment-221775529
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59342/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15542][SparkR] Make error message clear...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13308#issuecomment-221775481
  
**[Test build #59342 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59342/consoleFull)**
 for PR 13308 at commit 
[`cbd5163`](https://github.com/apache/spark/commit/cbd5163d73fa56a58e18598ece64aaa60e06cc1d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10903] [SPARKR] R - Simplify SQLContext...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9192#issuecomment-221774354
  
**[Test build #59341 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59341/consoleFull)**
 for PR 9192 at commit 
[`f67095e`](https://github.com/apache/spark/commit/f67095ef72540140aa2348b5262ffdf91685846a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10903] [SPARKR] R - Simplify SQLContext...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9192#issuecomment-221774407
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10903] [SPARKR] R - Simplify SQLContext...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9192#issuecomment-221774409
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59341/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15542][SparkR] Make error message clear...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13308#issuecomment-221774053
  
**[Test build #59342 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59342/consoleFull)**
 for PR 13308 at commit 
[`cbd5163`](https://github.com/apache/spark/commit/cbd5163d73fa56a58e18598ece64aaa60e06cc1d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [YARN][Doc][Minor] Remove several obsolete env...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13296#issuecomment-221773158
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59329/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [YARN][Doc][Minor] Remove several obsolete env...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13296#issuecomment-221773157
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [YARN][Doc][Minor] Remove several obsolete env...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13296#issuecomment-221773071
  
**[Test build #59329 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59329/consoleFull)**
 for PR 13296 at commit 
[`367e3b8`](https://github.com/apache/spark/commit/367e3b8de0633c100bc1a9bf4742f6af80ecfa68).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15542][SparkR] Make error message clear...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13308#issuecomment-221773031
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15542][SparkR] Make error message clear...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13308#issuecomment-221773032
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59340/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15542][SparkR] Make error message clear...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13308#issuecomment-221772977
  
**[Test build #59340 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59340/consoleFull)**
 for PR 13308 at commit 
[`88319c0`](https://github.com/apache/spark/commit/88319c022b8eb55f59f8080d488e30726f475580).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8603][SPARKR] Use shell() instead of sy...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13165#issuecomment-221772828
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10903] [SPARKR] R - Simplify SQLContext...

2016-05-25 Thread shivaram
Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/9192#issuecomment-221772896
  
Thanks for the update. LGTM. Will merge after Jenkins passes. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8603][SPARKR] Use shell() instead of sy...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13165#issuecomment-221772829
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59339/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8603][SPARKR] Use shell() instead of sy...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13165#issuecomment-221772778
  
**[Test build #59339 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59339/consoleFull)**
 for PR 13165 at commit 
[`0482ebb`](https://github.com/apache/spark/commit/0482ebbc43ff1bef8e7a6a16376c6ec36840a366).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15439][SparkR]:Failed to run unit test ...

2016-05-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13284


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15439][SparkR]:Failed to run unit test ...

2016-05-25 Thread shivaram
Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/13284#issuecomment-221772558
  
Yeah thats a good idea @wangmiao1981 can you open a JIRA to not mask 
`startsWith` and `endsWith` by updating our generics ? 

LGTM - Merging this to master and branch-2.0. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10903] [SPARKR] R - Simplify SQLContext...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9192#issuecomment-221772591
  
**[Test build #59341 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59341/consoleFull)**
 for PR 9192 at commit 
[`f67095e`](https://github.com/apache/spark/commit/f67095ef72540140aa2348b5262ffdf91685846a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15542][SparkR] Make error message clear...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13308#issuecomment-221772521
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59338/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15542][SparkR] Make error message clear...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13308#issuecomment-221772519
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15542][SparkR] Make error message clear...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13308#issuecomment-221772475
  
**[Test build #59338 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59338/consoleFull)**
 for PR 13308 at commit 
[`07806de`](https://github.com/apache/spark/commit/07806de09f4be0dd9501fe81684c07a45ad68672).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15439][SparkR]:Failed to run unit test ...

2016-05-25 Thread felixcheung
Github user felixcheung commented on the pull request:

https://github.com/apache/spark/pull/13284#issuecomment-221772080
  
looks fine - I think we should really try to make startsWith and endsWith 
work though, but that could be a follow up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15542][SparkR] Make error message clear...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13308#issuecomment-221771654
  
**[Test build #59340 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59340/consoleFull)**
 for PR 13308 at commit 
[`88319c0`](https://github.com/apache/spark/commit/88319c022b8eb55f59f8080d488e30726f475580).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15542][SparkR] Make error message clear...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13308#issuecomment-221771180
  
**[Test build #59338 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59338/consoleFull)**
 for PR 13308 at commit 
[`07806de`](https://github.com/apache/spark/commit/07806de09f4be0dd9501fe81684c07a45ad68672).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8603][SPARKR] Incorrect file separator ...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13165#issuecomment-221771183
  
**[Test build #59339 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59339/consoleFull)**
 for PR 13165 at commit 
[`0482ebb`](https://github.com/apache/spark/commit/0482ebbc43ff1bef8e7a6a16376c6ec36840a366).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15515] [SQL] Error Handling in Running ...

2016-05-25 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/13283#issuecomment-221770911
  
**Update**: The latest code changes contains 
- For JDBC format, we added an extra checking in the rule 
`ResolveRelations` of `Analyzer`. Without the PR, Spark will return the error 
message like: `Option 'url' not specified`. Now, we are reporting `Unsupported 
data source type for direct query on files: jdbc` 
- Make data source format name case incensitive so that error handling 
behaves consistent with the normal cases. 
- Added the test cases for all the supported formats. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8603][SPARKR] Incorrect file separator ...

2016-05-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/13165#issuecomment-221770814
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8603][SPARKR] Incorrect file separator ...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13165#issuecomment-221770536
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8603][SPARKR] Incorrect file separator ...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13165#issuecomment-221770524
  
**[Test build #59336 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59336/consoleFull)**
 for PR 13165 at commit 
[`0482ebb`](https://github.com/apache/spark/commit/0482ebbc43ff1bef8e7a6a16376c6ec36840a366).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8603][SPARKR] Incorrect file separator ...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13165#issuecomment-221770538
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59336/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10903] [SPARKR] R - Simplify SQLContext...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9192#issuecomment-221770384
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59337/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10903] [SPARKR] R - Simplify SQLContext...

2016-05-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9192#issuecomment-221770383
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10903] [SPARKR] R - Simplify SQLContext...

2016-05-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9192#issuecomment-221770380
  
**[Test build #59337 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59337/consoleFull)**
 for PR 9192 at commit 
[`90641a7`](https://github.com/apache/spark/commit/90641a71ff1860ddfe1a8e0bcb64cc0f0d2a56c6).
 * This patch **fails R style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

2016-05-25 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/13290#discussion_r64688437
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1448,6 +1450,37 @@ class Analyzer(
   }
 
   /**
+   * Fixes nullability of Attributes in a resolved LogicalPlan by using 
the nullability of
+   * corresponding Attributes of its children output Attributes. This step 
is needed because
+   * users can use a resolved AttributeReference in the Dataset API and 
outer joins
+   * can change the nullability of an AttribtueReference. Without the fix, 
a nullable column's
+   * nullable field can be actually set as non-nullable, which cause 
illegal optimization
+   * (e.g., NULL propagation) and wrong answers.
+   * See SPARK-13484 and SPARK-13801 for the concrete queries of this case.
+   */
+  object FixNullability extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+  case q: LogicalPlan if q.resolved =>
+val childrenOutput = q.children.flatMap(c => 
c.output).groupBy(_.exprId).flatMap {
--- End diff --

yes, I got your point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >