date:20160319

[GitHub] spark pull request: upgrade joda-time: 2.9 -> 2.9.2

2016-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11847#issuecomment-198853307
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: upgrade joda-time: 2.9 -> 2.9.2

2016-03-19 Thread sullis

GitHub user sullis opened a pull request:

https://github.com/apache/spark/pull/11847

upgrade joda-time: 2.9 -> 2.9.2



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sullis/spark joda-time-2.9.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11847.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11847


commit 50a67cbfbcdd59dd97551d50ce930e6f4f32550c
Author: Sean Sullivan 
Date:   2016-03-20T05:32:24Z

upgrade joda-time: 2.9 -> 2.9.2




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...

2016-03-19 Thread flyjy

Github user flyjy commented on the pull request:

https://github.com/apache/spark/pull/11812#issuecomment-198852800
  
Thanks. I have checked that the problem still exists with only the adaptive 
learning rate change.

So, I will fix this bug without change the existing interface. I think that 
the score should be between 0 and 1 based on the definition of cosine 
similarity.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

2016-03-19 Thread cloud-fan

Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/11756#issuecomment-197629010
  
LGTM, cc @davies  for another look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12469][CORE][WIP/RFC] Consistent accumu...

2016-03-19 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11105#discussion_r56426605
  
--- Diff: core/src/main/scala/org/apache/spark/Accumulable.scala ---
@@ -146,6 +212,32 @@ class Accumulable[R, T] private (
   def merge(term: R) { value_ = param.addInPlace(value_, term)}
 
   /**
+   * Merge in pending updates for ac consistent accumulators or merge 
accumulated values for
+   * regular accumulators. This is only called on the driver when merging 
task results together.
+   */
+  private[spark] def internalMerge(term: Any) {
+if (!consistent) {
+  merge(term.asInstanceOf[R])
+} else {
+  mergePending(term.asInstanceOf[mutable.HashMap[(Int, Int, Int), R]])
+}
+  }
+
+  /**
+   * Merge another Accumulable's pending updates, checks to make sure that 
each pending update has
+   * not already been processed before updating.
+   */
+  private[spark] def mergePending(term: mutable.HashMap[(Int, Int, Int), 
R]) = {
+term.foreach{case ((rddId, shuffleId, splitId), v) =>
+  val splits = processed.getOrElseUpdate((rddId, shuffleId), new 
mutable.BitSet())
+  if (!splits.contains(splitId)) {
+splits += splitId
+value_ = param.addInPlace(value_, v)
+  }
--- End diff --

Sure we could do that - I'd kept the separate processed since I thought the 
space efficiency of a bitset might be worth it as well as it seemed like it 
might be more confusing to have one val with two different meanings between 
driver & worker.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13988][Core] Make replaying event logs ...

2016-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11800#issuecomment-198112103
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198082685
  
**[Test build #53468 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53468/consoleFull)**
 for PR 11636 at commit 
[`5efadf3`](https://github.com/apache/spark/commit/5efadf3f159b40a04621832facbccb99cb4b2c5c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-19 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/10231#discussion_r56441090
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -842,60 +842,59 @@ private[ml] object RandomForest extends Logging {
 1.0
   }
   logDebug("fraction of data used for calculating quantiles = " + 
fraction)
-  input.sample(withReplacement = false, fraction, new 
XORShiftRandom(seed).nextInt()).collect()
+  input.sample(withReplacement = false, fraction, new 
XORShiftRandom(seed).nextInt())
 } else {
-  new Array[LabeledPoint](0)
+  input.sparkContext.emptyRDD[LabeledPoint]
 }
 
-val splits = new Array[Array[Split]](numFeatures)
-
-// Find all splits.
-// Iterate over all features.
-var featureIndex = 0
-while (featureIndex < numFeatures) {
-  if (metadata.isContinuous(featureIndex)) {
-val featureSamples = sampledInput.map(_.features(featureIndex))
-val featureSplits = findSplitsForContinuousFeature(featureSamples, 
metadata, featureIndex)
+findSplitsBinsBySorting(sampledInput, metadata, continuousFeatures)
+  }
 
-val numSplits = featureSplits.length
-logDebug(s"featureIndex = $featureIndex, numSplits = $numSplits")
-splits(featureIndex) = new Array[Split](numSplits)
+  private def findSplitsBinsBySorting(
+  input: RDD[LabeledPoint],
+  metadata: DecisionTreeMetadata,
+  continuousFeatures: IndexedSeq[Int]): Array[Array[Split]] = {
+
+val continuousSplits = {
+  // reduce the parallelism for split computations when there are less
+  // continuous features than input partitions. this prevents tasks 
from
+  // being spun up that will definitely do no work.
+  val numPartitions = math.min(continuousFeatures.length, 
input.partitions.length)
+
+  input
+.flatMap(point => continuousFeatures.map(idx => (idx, 
point.features(idx
+.groupByKey(numPartitions)
+.map { case (idx, samples) =>
+  val thresholds = findSplitsForContinuousFeature(samples.toArray, 
metadata, idx)
+  val splits: Array[Split] = thresholds.map(thresh => new 
ContinuousSplit(idx, thresh))
+  logDebug(s"featureIndex = $idx, numSplits = ${splits.length}")
+  (idx, splits)
+}.collectAsMap()
+}
 
-var splitIndex = 0
-while (splitIndex < numSplits) {
-  val threshold = featureSplits(splitIndex)
-  splits(featureIndex)(splitIndex) = new 
ContinuousSplit(featureIndex, threshold)
-  splitIndex += 1
-}
-  } else {
-// Categorical feature
-if (metadata.isUnordered(featureIndex)) {
-  val numSplits = metadata.numSplits(featureIndex)
-  val featureArity = metadata.featureArity(featureIndex)
-  // TODO: Use an implicit representation mapping each category to 
a subset of indices.
-  //   I.e., track indices such that we can calculate the set 
of bins for which
-  //   feature value x splits to the left.
-  // Unordered features
-  // 2^(maxFeatureValue - 1) - 1 combinations
-  splits(featureIndex) = new Array[Split](numSplits)
-  var splitIndex = 0
-  while (splitIndex < numSplits) {
-val categories: List[Double] =
-  extractMultiClassCategories(splitIndex + 1, featureArity)
-splits(featureIndex)(splitIndex) =
-  new CategoricalSplit(featureIndex, categories.toArray, 
featureArity)
-splitIndex += 1
-  }
-} else {
-  // Ordered features
-  //   Bins correspond to feature values, so we do not need to 
compute splits or bins
-  //   beforehand.  Splits are constructed as needed during 
training.
-  splits(featureIndex) = new Array[Split](0)
+val numFeatures = metadata.numFeatures
+val splits = Array.tabulate(numFeatures) {
+  case i if metadata.isContinuous(i) =>
+val split = continuousSplits(i)
+metadata.setNumSplits(i, split.length)
+split
+
+  case i if metadata.isCategorical(i) && metadata.isUnordered(i) =>
+// Unordered features
+// 2^(maxFeatureValue - 1) - 1 combinations
+val featureArity = metadata.featureArity(i)
+Array.tabulate[Split](metadata.numSplits(i)) { splitIndex =>
+  val categories = extractMultiClassCategories(splitIndex + 1, 
featureArity)
+  new CategoricalSplit(i, categories.toArray, featureArity)
 }
-  }
-

[GitHub] spark pull request: [SPARK-13958]Executor OOM due to unbounded gro...

2016-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11794#issuecomment-198535076
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53550/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13761] [ML] Deprecate validateParams

2016-03-19 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/11620#discussion_r56513463
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -549,7 +548,9 @@ trait Params extends Identifiable with Serializable {
* Parameter value checks which do not depend on other parameters are 
handled by
* [[Param.validate()]].  This method does not handle input/output 
column parameters;
* those are checked during schema validation.
+   * @deprecated Will be removed in 2.1.0. All the checks should be merged 
into transformSchema
*/
+  @deprecated("Will be removed in 2.1.0. Checks should be merged into 
transformSchema.", "2.0.0")
--- End diff --

It looks like this now causes a number of deprecation warnings in the Spark 
code, which we're trying to get rid of. Can most of the remaining usages be 
transformed to not use this method?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...

2016-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11756#issuecomment-198214354
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13974][SQL] sub-query names do not need...

2016-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11783#issuecomment-197913617
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53430/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198084216
  
**[Test build #53468 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53468/consoleFull)**
 for PR 11636 at commit 
[`5efadf3`](https://github.com/apache/spark/commit/5efadf3f159b40a04621832facbccb99cb4b2c5c).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class InputReference(ordinal: Int, dataType: DataType, nullable: 
Boolean, isColumn: Boolean)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13852][YARN]handle the InterruptedExcep...

2016-03-19 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/11692#issuecomment-197342124
  
So inside of hadoop in the getApplicationReport call, it was in 
RetryInvocationHandler which was doing a sleep and got an interrupted 
exception.  That ended up throwing a 
java.lang.reflect.UndeclaredThrowableException up to monitorApplication which 
is why it was handled by the NonFatal catch.

I need to look at it a bit closer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

2016-03-19 Thread keypointt

Github user keypointt commented on the pull request:

https://github.com/apache/spark/pull/11142#issuecomment-197375803
  
cc @mengxr 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13982][SparkR] Fixed features column he...

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11793#issuecomment-198066744
  
**[Test build #53464 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53464/consoleFull)**
 for PR 11793 at commit 
[`48061de`](https://github.com/apache/spark/commit/48061de21addf5b021f874fa93cddb2882959042).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13808][test-maven] Don't build assembly...

2016-03-19 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/11701#issuecomment-197535055
  
Since all of this code is going to be changed heavily / removed after your 
final patch, I'm going to go ahead and just leave the Maven test path unchanged 
so that we can get this merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13997][SQL] Use Hadoop 2.0 default valu...

2016-03-19 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11806#issuecomment-198207204
  
The efficiency of compression algorithms usually goes down as the frame 
(block) size goes down.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-13034[ML]:PySpark ml.classification supp...

2016-03-19 Thread wangmiao1981

Github user wangmiao1981 commented on the pull request:

https://github.com/apache/spark/pull/11582#issuecomment-198064456
  
close this one as it has been merged with 11707.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11723#issuecomment-198256400
  
**[Test build #53525 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53525/consoleFull)**
 for PR 11723 at commit 
[`ae808d7`](https://github.com/apache/spark/commit/ae808d73e022077dba6ad999627589eed4730270).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13950] [SQL] generate code for sort mer...

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11771#issuecomment-197592210
  
**[Test build #53371 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53371/consoleFull)**
 for PR 11771 at commit 
[`99df29a`](https://github.com/apache/spark/commit/99df29a8c9b0bc7df7aef1a37b7c5c7bff7a1ff1).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14007] [SQL] Manage the memory used by ...

2016-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11826#issuecomment-198481175
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53547/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11792#issuecomment-198090295
  
**[Test build #53463 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53463/consoleFull)**
 for PR 11792 at commit 
[`29e408d`](https://github.com/apache/spark/commit/29e408d61f3557b1c5df343d039ef74d1c6e9ab3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13826][SQL] Revises Dataset ScalaDoc

2016-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11769#issuecomment-197553484
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53340/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13957] [SQL] Support Group By Ordinal i...

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11846#issuecomment-198846521
  
**[Test build #53624 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53624/consoleFull)**
 for PR 11846 at commit 
[`79a537a`](https://github.com/apache/spark/commit/79a537aecdd788a80948aa22f61cca4901e8d0ee).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13914] [Scheduler] Add functionality to...

2016-03-19 Thread paragpc

Github user paragpc closed the pull request at:

https://github.com/apache/spark/pull/11736


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13922][SQL] Filter rows with null attri...

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11749#issuecomment-197556324
  
**[Test build #53354 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53354/consoleFull)**
 for PR 11749 at commit 
[`0688cf8`](https://github.com/apache/spark/commit/0688cf84958552132aaa8ada960b9c4880b437e6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13976][SQL] do not remove sub-queries a...

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11786#issuecomment-197955452
  
**[Test build #53435 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53435/consoleFull)**
 for PR 11786 at commit 
[`ee5a437`](https://github.com/apache/spark/commit/ee5a43739b895708c054558c62263a6a59aa4b1e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor][DOC] Fix nits in JavaStreamingTestExam...

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11821#issuecomment-198317555
  
**[Test build #53532 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53532/consoleFull)**
 for PR 11821 at commit 
[`09ad928`](https://github.com/apache/spark/commit/09ad928e3f5efde847eae324b648bfed227c0f34).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-13991 - Extend the enforcer plugin Maven...

2016-03-19 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/11803#discussion_r56646795
  
--- Diff: pom.xml ---
@@ -1733,7 +1733,7 @@
   
 
   
-${maven.version}
+[3.3,)
--- End diff --

Actually, that's what the existing specification already means:
https://maven.apache.org/enforcer/enforcer-rules/versionRanges.html



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13923] [SQL] Implement SessionCatalog

2016-03-19 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/11750#discussion_r56422492
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -211,8 +214,7 @@ case class CatalogTablePartition(
  * future once we have a better understanding of how we want to handle 
skewed columns.
  */
 case class CatalogTable(
-specifiedDatabase: Option[String],
-name: String,
+name: TableIdentifier,
--- End diff --

Maybe we can rename this `name` later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13808][test-maven] Don't build assembly...

2016-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11701#issuecomment-197467017
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13908][SQL] Add a LocalLimit for Collec...

2016-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11817#issuecomment-198414800
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53539/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13826][SQL] Addendum: update documentat...

2016-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11814#issuecomment-198239537
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53509/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13937][PySpark][ML] Change JavaWrapper ...

2016-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11767#issuecomment-197490633
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53334/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11621#issuecomment-197492294
  
**[Test build #53335 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53335/consoleFull)**
 for PR 11621 at commit 
[`460881c`](https://github.com/apache/spark/commit/460881cffcb9b6bce35b822e4a325352074d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13629] [ML] Add binary toggle Param to ...

2016-03-19 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/11536#issuecomment-198505962
  
@hhbyyh Thanks for the PR!  LGTM, I agree with the use of minTF


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13926] Automatically use Kryo serialize...

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11755#issuecomment-197560970
  
**[Test build #53353 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53353/consoleFull)**
 for PR 11755 at commit 
[`45b0c0b`](https://github.com/apache/spark/commit/45b0c0be3791e518f0a8783951ad9b9e53196a1e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  class DecisionTreeClassificationModelWriter(instance: 
DecisionTreeClassificationModel)`
  * `  class DecisionTreeRegressionModelWriter(instance: 
DecisionTreeRegressionModel)`
  * `  case class SplitData(`
  * `  case class NodeData(`
  * `class Estimator(Params):`
  * `class Transformer(Params):`
  * `class Model(Transformer):`
  * `class LogisticRegressionModel(JavaModel, MLWritable, MLReadable):`
  * `class NaiveBayesModel(JavaModel, MLWritable, MLReadable):`
  * `class PipelineMLWriter(JavaMLWriter, JavaWrapper):`
  * `class PipelineMLReader(JavaMLReader):`
  * `class PipelineModelMLWriter(JavaMLWriter, JavaWrapper):`
  * `class PipelineModelMLReader(JavaMLReader):`
  * `  case class SQLTable(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13449] Naive Bayes wrapper in SparkR

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11486#issuecomment-198845476
  
**[Test build #53622 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53622/consoleFull)**
 for PR 11486 at commit 
[`b4ee1aa`](https://github.com/apache/spark/commit/b4ee1aab70008919ba17cf02c8470f1a75c23ef8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13449] Naive Bayes wrapper in SparkR

2016-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11486#issuecomment-198845488
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53622/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13449] Naive Bayes wrapper in SparkR

2016-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11486#issuecomment-198845487
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13826][SQL] Addendum: update documentat...

2016-03-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11814


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13993][PySpark] Add pyspark Rformula/Rf...

2016-03-19 Thread yinxusen

Github user yinxusen commented on the pull request:

https://github.com/apache/spark/pull/11807#issuecomment-198147707
  
@jkbradley This is a follow-up for https://github.com/apache/spark/pull/9884


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13742][Core] Add non-iterator interface...

2016-03-19 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11578#discussion_r56611345
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala ---
@@ -155,6 +171,28 @@ class BernoulliSampler[T: ClassTag](fraction: Double) 
extends RandomSampler[T, T
 
   override def setSeed(seed: Long): Unit = rng.setSeed(seed)
 
+  private val gapSampling: GapSampling = if (fraction > 0.0 && fraction < 
1.0) {
--- End diff --

Would this (and subsequent one) maybe be simpler as a lazy val and then get 
rid of the if/else/null thing? It seems like right now we make this some times 
even when it isn't used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13839][SQL] Defer input evaluation and ...

2016-03-19 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11676#discussion_r56455098
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala ---
@@ -80,12 +81,21 @@ case class Filter(condition: Expression, child: 
SparkPlan)
   // Split out all the IsNotNulls from condition.
   private val (notNullPreds, otherPreds) = 
splitConjunctivePredicates(condition).partition {
 case IsNotNull(a) if child.output.contains(a) => true
+case IsNotNull(a) =>
+  a match {
+case Casts(a) if child.output.contains(a) => true
--- End diff --

We should not add these corner cases here, they should be handled by 
constraints.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13320] [SQL] Support Star in CreateStru...

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11208#issuecomment-197606557
  
**[Test build #53376 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53376/consoleFull)**
 for PR 11208 at commit 
[`e060dea`](https://github.com/apache/spark/commit/e060deaaf09d122966f090bf3b86895636418664).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13839][SQL] Defer input evaluation and ...

2016-03-19 Thread viirya

Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/11676#issuecomment-197698865
  
cc @davies Can you please review this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13889][YARN][Branch-1.6]Fix the calcula...

2016-03-19 Thread carsonwang

GitHub user carsonwang opened a pull request:

https://github.com/apache/spark/pull/11813

[SPARK-13889][YARN][Branch-1.6]Fix the calculation of the max number of 
executor failure

## What changes were proposed in this pull request?

Backport #11713 to 1.6.
The max number of executor failure before failing the application is 
default to twice the maximum number of executors if dynamic allocation is 
enabled. The default value for "spark.dynamicAllocation.maxExecutors" is 
Int.MaxValue. The calculated value of the default max number of executor 
failure should be Int.MaxValue instead of only 3.

## How was this patch tested?
It tests if the value is greater that Int.MaxValue / 2 to avoid the 
overflow when it multiplies 2.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/carsonwang/spark branch-1.6-ExecutorFailNum

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11813.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11813


commit 17d8bc1f13c3b29e22ecbec6a9f08491e5970368
Author: Carson Wang 
Date:   2016-03-18T05:15:40Z

Fix the calculation of the max number of executor failure




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/11788#discussion_r56617782
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoin.scala
 ---
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.joins
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.{Expression, JoinedRow}
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.physical._
+import org.apache.spark.sql.execution.{BinaryNode, SparkPlan}
+import org.apache.spark.sql.execution.metric.SQLMetrics
+
+/**
+ * Performs an inner hash join of two child relations by first shuffling 
the data using the join
--- End diff --

inner?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14004][SQL] NamedExpressions should hav...

2016-03-19 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/11822#issuecomment-198318534
  
Personally, I had once been quite confused by the fact that 
`NamedExpression.qualifiers` is a `Seq[String]` and thought that attributes can 
be qualified with multiple qualifiers like `db.table.column`. That's why 
current version of `AttributeReference.sql` joins all qualifiers using `.` 
rather than picking the first one.

I believe it's safe and good to enforce the at-most-one-qualifier 
constraint at type level unless there do exist valid cases where using multiple 
qualifiers makes sense but haven't been implemented in Spark SQL yet.

cc @marmbrus @rxin @yhuai


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13989][SQL] Remove non-vectorized/unsaf...

2016-03-19 Thread sameeragarwal

Github user sameeragarwal commented on the pull request:

https://github.com/apache/spark/pull/11799#issuecomment-198152113
  
Thanks, all comments addressed!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13997][SQL] Use Hadoop 2.0 default valu...

2016-03-19 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/11806#issuecomment-198314857
  
It does mean each record is compressed separately. Maybe that makes sense 
for huge records, or somehow facilitates processing pieces of a block (since 
the whole block has to be uncompressed to use any of it). However Tom's book 
says block compression should be preferred. I don't know why it's not the 
default. Also summoning @steveloughran 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13928] Move org.apache.spark.Logging in...

2016-03-19 Thread cloud-fan

Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/11764#issuecomment-197641888
  
The log says: `java.lang.RuntimeException: spark-core: Binary compatibility 
check failed!`, but no reason is provided...  cc @JoshRosen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13320] [SQL] Support Star in CreateStru...

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11208#issuecomment-197634039
  
**[Test build #53376 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53376/consoleFull)**
 for PR 11208 at commit 
[`e060dea`](https://github.com/apache/spark/commit/e060deaaf09d122966f090bf3b86895636418664).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13826][SQL] Revises Dataset ScalaDoc

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11769#issuecomment-197763593
  
**[Test build #53407 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53407/consoleFull)**
 for PR 11769 at commit 
[`6062f49`](https://github.com/apache/spark/commit/6062f49ba6d123e731d6103beeaa2b0441257253).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12789] [SQL] Support Order By Ordinal i...

2016-03-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/11815#discussion_r56620898
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -202,3 +203,14 @@ object Unions {
 }
   }
 }
+
+/**
+ * Extractor for retrieving Int value.
+ */
+object IntegerIndex {
+  def unapply(a: Any): Option[Int] = a match {
+case Literal(a: Int, IntegerType) => Some(a)
+case UnaryMinus(IntegerLiteral(v)) => Some(-v)
--- End diff --

ah ic so this is used to detect errors. i'd add some comment here 
explaining why we are having this.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13957] [SQL] Support Group By Ordinal i...

2016-03-19 Thread gatorsmile

Github user gatorsmile closed the pull request at:

https://github.com/apache/spark/pull/11846


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13957] [SQL] Support Group By Ordinal i...

2016-03-19 Thread gatorsmile

GitHub user gatorsmile reopened a pull request:

https://github.com/apache/spark/pull/11846

[SPARK-13957] [SQL] Support Group By Ordinal in SQL

 What changes were proposed in this pull request?
This PR is to support group by position in SQL. For example, when users 
input the following query
```SQL
select c1 as a, c2, c3, sum(*) from tbl group by 1, 3, c4
```
The ordinals are recognized as the positions in the select list. Thus, 
`Analyzer` converts it to 
```SQL
select c1, c2, c3, sum(*) from tbl group by c1, c3, c4
```

This is controlled by the config option `spark.sql.groupByOrdinal`.
- When true, the ordinal numbers in group by clauses are treated as the 
position in the select list.
- When false, the ordinal numbers are ignored.
- Only convert integer literals (not foldable expressions). If found 
foldable expressions, ignore them. 
- When the positions specified in the group by clauses correspond to the 
aggregate functions in select list, output an exception message.

Note: This PR is taken from https://github.com/apache/spark/pull/10731. 
When merging this PR, please give the credit to @zhichao-li

Also cc all the people who are involved in the previous discussion:  @rxin 
@cloud-fan @marmbrus @yhuai @hvanhovell @adrian-wang @chenghao-intel 
@tejasapatil

 How was this patch tested?

Added a few test cases for both positive and negative test cases.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark groupByOrdinal

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11846.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11846


commit 95f25a6eb688a2cf3e3efa6ec7b7715884b1fa7b
Author: gatorsmile 
Date:   2016-03-20T04:00:32Z

group by ordinals

commit a9273761d4dfc3c7a95d570884bfbcc420a119e9
Author: gatorsmile 
Date:   2016-03-20T04:08:37Z

Merge remote-tracking branch 'upstream/master' into groupByOrdinal

commit b10d076a71d863255a901861f5ca571816d8fca7
Author: gatorsmile 
Date:   2016-03-20T04:11:34Z

fix messages.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13957] [SQL] Support Group By Ordinal i...

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11846#issuecomment-198845243
  
**[Test build #53623 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53623/consoleFull)**
 for PR 11846 at commit 
[`b10d076`](https://github.com/apache/spark/commit/b10d076a71d863255a901861f5ca571816d8fca7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14012][SQL] Extract VectorizedColumnRea...

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11834#issuecomment-198571979
  
**[Test build #53578 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53578/consoleFull)**
 for PR 11834 at commit 
[`3685480`](https://github.com/apache/spark/commit/3685480cdf018d80adc4289e34d2eba458ef7cb9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13923] [SQL] Implement SessionCatalog

2016-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11750#issuecomment-197620375
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor][DOC] Add JavaStreamingTestExample

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11776#issuecomment-197732446
  
**[Test build #53399 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53399/consoleFull)**
 for PR 11776 at commit 
[`ff56ff5`](https://github.com/apache/spark/commit/ff56ff56d46db9ee64924c44fb18c03c0ff91e4d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12379][ML][MLLIB] Copy GBT implementati...

2016-03-19 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/10607#issuecomment-198033902
  
Thanks for doing this migration.  I checked the PR and it LGTM

Your tests look good to me.  The tests all seem fairly close, except for a 
couple of outliers, but even those seem within a standard deviation or so (the 
2nd value in spark-perf results).  Thanks for running them!

Also @MLnick 
> As part of those tickets, I think we can clean up this ML impl and 
interfaces if required (e.g. we could look at removing theprivate [ml] train 
method in favour of one in MLLIb that converts RDDs to DataFrame and calls ML, 
we can make more stuff private where possible, etc). But I think it'll be a lot 
easier to clean things up once everything is in ML.

If the ML implementation uses RDDs underneath, it will be nice to call 
directly into that implementation from spark.mllib in order to avoid 
serialization overhead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12469][CORE][WIP/RFC] Consistent accumu...

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11105#issuecomment-197716738
  
**[Test build #53389 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53389/consoleFull)**
 for PR 11105 at commit 
[`8ddaf7c`](https://github.com/apache/spark/commit/8ddaf7c7c96e5bbb1cd2f11844db847bc52fe77f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12789] [SQL] Support Order By Ordinal i...

2016-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11815#issuecomment-198245857
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13602][CORE] Add shutdown hook to Drive...

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11746#issuecomment-198142673
  
**[Test build #53475 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53475/consoleFull)**
 for PR 11746 at commit 
[`86eb800`](https://github.com/apache/spark/commit/86eb800f2ee6a6bfa0671c1c17192bd4ab934ff0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12719][HOTFIX] Fix compilation against ...

2016-03-19 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/11787#issuecomment-197977494
  
Thanks. I am merging this to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR][SQL][BUILD] Remove duplicated lines

2016-03-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/11773#discussion_r56431109
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala ---
@@ -49,7 +49,6 @@ class JoinSuite extends QueryTest with SharedSQLContext {
   case j: BroadcastHashJoin => j
   case j: CartesianProduct => j
   case j: BroadcastNestedLoopJoin => j
-  case j: BroadcastHashJoin => j
--- End diff --

See the line 49.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13957] [SQL] Support Group By Ordinal i...

2016-03-19 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/11846

[SPARK-13957] [SQL] Support Group By Ordinal in SQL

 What changes were proposed in this pull request?
This PR is to support group by position in SQL. For example, when users 
input the following query
```SQL
select c1 as a, c2, c3, sum(*) from tbl group by 1, 3, c4
```
The ordinals are recognized as the positions in the select list. Thus, 
`Analyzer` converts it to 
```SQL
select c1, c2, c3, sum(*) from tbl group by c1, c3, c4
```

This is controlled by the config option `spark.sql.groupByOrdinal`.
- When true, the ordinal numbers in group by clauses are treated as the 
position in the select list.
- When false, the ordinal numbers are ignored.
- Only convert integer literals (not foldable expressions). If found 
foldable expressions, ignore them. 
- When the positions specified in the group by clauses correspond to the 
aggregate functions in select list, output an exception message.

Note: This PR is taken from https://github.com/apache/spark/pull/10731. 
When merging this PR, please give the credit to @zhichao-li

Also cc all the people who are involved in the previous discussion:  @rxin 
@cloud-fan @marmbrus @yhuai @hvanhovell @adrian-wang @chenghao-intel 
@tejasapatil

 How was this patch tested?

Added a few test cases for both positive and negative test cases.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark groupByOrdinal

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11846.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11846


commit 95f25a6eb688a2cf3e3efa6ec7b7715884b1fa7b
Author: gatorsmile 
Date:   2016-03-20T04:00:32Z

group by ordinals

commit a9273761d4dfc3c7a95d570884bfbcc420a119e9
Author: gatorsmile 
Date:   2016-03-20T04:08:37Z

Merge remote-tracking branch 'upstream/master' into groupByOrdinal

commit b10d076a71d863255a901861f5ca571816d8fca7
Author: gatorsmile 
Date:   2016-03-20T04:11:34Z

fix messages.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...

2016-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10231#issuecomment-198504543
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53553/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13928] Move org.apache.spark.Logging in...

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11764#issuecomment-197700493
  
**[Test build #53392 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53392/consoleFull)**
 for PR 11764 at commit 
[`e875d82`](https://github.com/apache/spark/commit/e875d823d24139235e88031775354d28a6061997).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13958]Executor OOM due to unbounded gro...

2016-03-19 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11794#discussion_r56697868
  
--- Diff: 
core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
@@ -320,7 +320,15 @@ private void growPointerArrayIfNecessary() throws 
IOException {
 assert(inMemSorter != null);
 if (!inMemSorter.hasSpaceForAnotherRecord()) {
   long used = inMemSorter.getMemoryUsage();
-  LongArray array = allocateArray(used / 8 * 2);
+  LongArray array;
+  try {
+// could trigger spilling
+array = allocateArray(used / 8 * 2);
+  } catch (OutOfMemoryError e) {
+// should have trigger spilling
+assert(inMemSorter.hasSpaceForAnotherRecord());
--- End diff --

I see, then use a `if` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-13034] PySpark ml.classification suppor...

2016-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11707#issuecomment-197553310
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

2016-03-19 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/11663#discussion_r56406664
  
--- Diff: python/pyspark/ml/param/__init__.py ---
@@ -275,23 +382,9 @@ def _set(self, **kwargs):
 """
 for param, value in kwargs.items():
 p = getattr(self, param)
-if p.expectedType is None or type(value) == p.expectedType or 
value is None:
-self._paramMap[getattr(self, param)] = value
-else:
-try:
-# Try and do "safe" conversions that don't lose 
information
-if p.expectedType == float:
-self._paramMap[getattr(self, param)] = float(value)
-# Python 3 unified long & int
-elif p.expectedType == int and type(value).__name__ == 
'long':
-self._paramMap[getattr(self, param)] = value
-else:
-raise Exception(
-"Provided type {0} incompatible with type {1} 
for param {2}"
-.format(type(value), p.expectedType, p))
-except ValueError:
-raise Exception(("Failed to convert {0} to type {1} 
for param {2}"
- .format(type(value), p.expectedType, 
p)))
+if value is not None:
+value = p.typeConverter(value)
+self._paramMap[getattr(self, param)] = value
--- End diff --

reuse value ```p```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR][SQL][BUILD] Remove duplicated lines

2016-03-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/11773#discussion_r56431084
  
--- Diff: project/MimaExcludes.scala ---
@@ -299,13 +299,11 @@ object MimaExcludes {
 // [SPARK-13244][SQL] Migrates DataFrame to Dataset
 
ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.DataFrameHolder.apply"),
 
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.sql.DataFrameHolder.toDF"),
-
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.sql.DataFrameHolder.toDF"),
 
ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.DataFrameHolder.copy"),
 
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.sql.DataFrameHolder.copy$default$1"),
 
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.sql.DataFrameHolder.df$1"),
 
ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.DataFrameHolder.this"),
 
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.sql.SQLContext.tables"),
-
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.sql.SQLContext.tables"),
--- End diff --

See the line 307.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13928] Move org.apache.spark.Logging in...

2016-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11764#issuecomment-197407360
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53322/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-13809][SQL] State store for stream...

2016-03-19 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/11645#discussion_r56386741
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala
 ---
@@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming.state
+
+import java.util.{Timer, TimerTask}
+
+import scala.collection.mutable
+import scala.util.control.NonFatal
+
+import org.apache.spark.{Logging, SparkEnv}
+import org.apache.spark.sql.catalyst.InternalRow
+
+/** Unique identifier for a [[StateStore]] */
+case class StateStoreId(operatorId: Long, partitionId: Int)
+
+/**
+ * Base trait for a versioned key-value store used for streaming 
aggregations
+ */
+trait StateStore {
+
+  /** Unique identifier of the store */
+  def id: StateStoreId
+
+  /** Version of the data in this store before committing updates. */
+  def version: Long
+
+  /**
+   * Update the value of a key using the value generated by the update 
function.
+   * This can be called only after prepareForUpdates() has been called in 
the same thread.
+   */
+  def update(key: InternalRow, updateFunc: Option[InternalRow] => 
InternalRow): Unit
+
+  /**
+   * Remove keys that match the following condition.
+   * This can be called only after prepareForUpdates() has been called in 
the current thread.
+   */
+  def remove(condition: InternalRow => Boolean): Unit
+
+  /**
+   * Commit all the updates that have been made to the store.
+   * This can be called only after prepareForUpdates() has been called in 
the current thread.
+   */
+  def commit(): Long
+
+  /** Cancel all the updates that have been made to the store. */
+  def cancel(): Unit
+
+  /**
+   * Iterator of store data after a set of updates have been committed.
+   * This can be called only after commitUpdates() has been called in the 
current thread.
+   */
+  def iterator(): Iterator[InternalRow]
+
+  /**
+   * Iterator of the updates that have been committed.
+   * This can be called only after commitUpdates() has been called in the 
current thread.
+   */
+  def updates(): Iterator[StoreUpdate]
+
+  /**
+   * Whether all updates have been committed
+   */
+  def hasCommitted: Boolean
+}
+
+
+trait StateStoreProvider {
+
+  /** Get the store with the existing version. */
+  def getStore(version: Long): StateStore
+
+  /** Optional method for providers to allow for background management */
+  def manage(): Unit = { }
+}
+
+sealed trait StoreUpdate
+case class ValueAdded(key: InternalRow, value: InternalRow) extends 
StoreUpdate
+case class ValueUpdated(key: InternalRow, value: InternalRow) extends 
StoreUpdate
+case class KeyRemoved(key: InternalRow) extends StoreUpdate
+
+
+/**
+ * Companion object to [[StateStore]] that provides helper methods to 
create and retrive stores
+ * by their unique ids.
+ */
+private[state] object StateStore extends Logging {
+
+  private val MANAGEMENT_TASK_INTERVAL_SECS = 60
+
+  private val loadedProviders = new mutable.HashMap[StateStoreId, 
StateStoreProvider]()
+  private val managementTimer = new Timer("StateStore Timer", true)
+  @volatile private var managementTask: TimerTask = null
+
+  /** Get or create a store associated with the id. */
+  def get(storeId: StateStoreId, directory: String, version: Long): 
StateStore = {
+require(version >= 0)
+val storeProvider = loadedProviders.synchronized {
+  startIfNeeded()
+  val provider = loadedProviders.getOrElseUpdate(
+  storeId, new HDFSBackedStateStoreProvider(storeId, directory))
+  reportActiveInstance(storeId)
+  provider
+}
+storeProvider.getStore(version)
+  }
+
+  def remove(storeId: StateStoreId): Unit =

[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...

2016-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11636#issuecomment-198598006
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

2016-03-19 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/11663#discussion_r56406609
  
--- Diff: python/pyspark/ml/param/__init__.py ---
@@ -32,13 +35,17 @@ class Param(object):
 .. versionadded:: 1.3.0
 """
 
-def __init__(self, parent, name, doc, expectedType=None):
+def __init__(self, parent, name, doc, expectedType=None, 
typeConverter=None):
 if not isinstance(parent, Identifiable):
 raise TypeError("Parent must be an Identifiable but got type 
%s." % type(parent))
 self.parent = parent.uid
 self.name = str(name)
 self.doc = str(doc)
 self.expectedType = expectedType
+if expectedType is not None:
+warnings.warn("expectedType is deprecated and will be removed 
in 2.1.0, " +
+  "use typeConverter instead.")
--- End diff --

"use typeConverter instead, as a keyword argument"

Also, I'd put this same message in the docstring too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13958]Executor OOM due to unbounded gro...

2016-03-19 Thread sitalkedia

Github user sitalkedia commented on a diff in the pull request:

https://github.com/apache/spark/pull/11794#discussion_r56696662
  
--- Diff: 
core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java ---
@@ -320,7 +320,15 @@ private void growPointerArrayIfNecessary() throws 
IOException {
 assert(inMemSorter != null);
 if (!inMemSorter.hasSpaceForAnotherRecord()) {
   long used = inMemSorter.getMemoryUsage();
-  LongArray array = allocateArray(used / 8 * 2);
+  LongArray array;
+  try {
+// could trigger spilling
+array = allocateArray(used / 8 * 2);
+  } catch (OutOfMemoryError e) {
+// should have trigger spilling
+assert(inMemSorter.hasSpaceForAnotherRecord());
--- End diff --

Hmm.. I tried changing it to require but compiler does not seem to like it. 
May be because its a java  file and can't import scala methods?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11891] Model export/import for RFormula...

2016-03-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/9884


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13921] Store serialized blocks as multi...

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11748#issuecomment-198007459
  
**[Test build #53442 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53442/consoleFull)**
 for PR 11748 at commit 
[`3fc0b66`](https://github.com/apache/spark/commit/3fc0b66981aa2d45be129986f0dc5bd595e08b22).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13449] Naive Bayes wrapper in SparkR

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11486#issuecomment-198841429
  
**[Test build #53622 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53622/consoleFull)**
 for PR 11486 at commit 
[`b4ee1aa`](https://github.com/apache/spark/commit/b4ee1aab70008919ba17cf02c8470f1a75c23ef8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14000][SQL] case class with a tuple fie...

2016-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11816#issuecomment-198283547
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

2016-03-19 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/11663#discussion_r56548536
  
--- Diff: python/pyspark/ml/param/__init__.py ---
@@ -65,6 +72,106 @@ def __eq__(self, other):
 return False
 
 
+class TypeConverters(object):
+"""
+.. note:: DeveloperApi
+
+Factory methods for common type conversion functions for 
`Param.typeConverter`.
+
+.. versionadded:: 2.0.0
+"""
+
+@staticmethod
+def _is_numeric(value):
+vtype = type(value)
+return vtype == int or vtype == float or vtype == np.float64 \
+or vtype == np.int64 or vtype.__name__ == 'long'
+
+@staticmethod
+def _can_convert_to_list(value):
+vtype = type(value)
+return vtype == list or vtype == np.ndarray or isinstance(value, 
Vector)
+
+@staticmethod
+def identity(value):
+"""
+Dummy converter that just returns value.
+"""
+return value
+
+@staticmethod
+def convertToList(value):
+"""
+Convert a value to a list, if possible.
+"""
+if type(value) == list:
+return value
+elif type(value) == np.ndarray:
+return list(value)
+elif isinstance(value, Vector):
+return value.toArray()
+else:
+raise TypeError("Could not convert %s to list" % value)
+
+@staticmethod
+def convertToListFloat(value):
+"""
+Convert a value to list of floats, if possible.
+"""
+if TypeConverters._can_convert_to_list(value) and \
+all(map(lambda v: TypeConverters._is_numeric(v), value)):
+value = TypeConverters.convertToList(value)
+return list(map(lambda v: float(v), value))
+else:
+raise TypeError("Could not convert %s to list of floats" % 
value)
+
+@staticmethod
+def convertToListInt(value):
+"""
+Convert a value to list of ints, if possible.
+"""
+if TypeConverters._can_convert_to_list(value) and \
+all(map(lambda v: TypeConverters._is_numeric(v), value)):
+value = TypeConverters.convertToList(value)
+return list(map(lambda v: int(v), value))
+else:
+raise TypeError("Could not convert %s to list of ints" % value)
+
+@staticmethod
+def convertToVector(value):
+"""
+Convert a value to a MLlib Vector, if possible.
+"""
+if isinstance(value, Vector):
+return value
+elif TypeConverters._can_convert_to_list(value) and \
+all(map(lambda v: TypeConverters._is_numeric(v), value)):
+value = DenseVector(value)
+else:
+raise TypeError("Could not convert %s to vector" % value)
+return value
+
+@staticmethod
+def convertToFloat(value):
+"""
+Convert a value to a float, if possible.
+"""
+if TypeConverters._is_numeric(value):
+return float(value)
+else:
+raise TypeError("Could not convert %s to float" % value)
+
+@staticmethod
+def convertToInt(value):
+"""
+Convert a value to an int, if possible.
+"""
+if TypeConverters._is_numeric(value):
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...

2016-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11788#issuecomment-198070544
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53451/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13808][test-maven] Don't build assembly...

2016-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11701#issuecomment-197513303
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53336/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11888] [ML] Decision tree persistence i...

2016-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11581#issuecomment-197552020
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53346/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13997][SQL] Use Hadoop 2.0 default valu...

2016-03-19 Thread HyukjinKwon

Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/11806#issuecomment-198364410
  
@tomwitte Sorry for adding more comments but does that mean the default 
value in Hadoop 1.x is BLOCK?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13942][CORE][DOCS] Remove Shark-related...

2016-03-19 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/11770#issuecomment-197561885
  
Removing Shark docs part looks OK. The slightly controversial bit is making 
`SparkEnv` private. People might depend on that. @rxin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-19 Thread iyounus

Github user iyounus commented on the pull request:

https://github.com/apache/spark/pull/11610#issuecomment-197468720
  
One problem with the eigen decomposition method is that for rank deficient 
matrix some of the eigenvalues can be extremely small (instead of being zero) 
and their contribution to the inverse can become very large.

I'll try out these methods  (DGELSD and eigen decomposition) and see how 
they behave in this case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13826][SQL] Revises Dataset ScalaDoc

2016-03-19 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/11769#issuecomment-197512611
  
cc @rxin @marmbrus @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14011][CORE][SQL] Enable `LineLength` J...

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11831#issuecomment-198549526
  
**[Test build #53566 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53566/consoleFull)**
 for PR 11831 at commit 
[`2923ef0`](https://github.com/apache/spark/commit/2923ef095369376be03a868c2bf2375294dab6d1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13816][Graphx] Add parameter checks for...

2016-03-19 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11655#issuecomment-197485725
  
Thanks - merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...

2016-03-19 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10731#issuecomment-197698035
  
Also I'd say "by position", not "by index", since index usually refers to 
something else in databases.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13903][SQL] Modify output nullability w...

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11722#issuecomment-197278897
  
**[Test build #53297 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53297/consoleFull)**
 for PR 11722 at commit 
[`c7d54a0`](https://github.com/apache/spark/commit/c7d54a0fb78c826903c0db8f1b1ac7b0d54bb303).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13629] [ML] Add binary toggle Param to ...

2016-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11536#issuecomment-197782539
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13928] Move org.apache.spark.Logging in...

2016-03-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11764#issuecomment-197639904
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53383/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

2016-03-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11621#issuecomment-198026652
  
**[Test build #53449 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53449/consoleFull)**
 for PR 11621 at commit 
[`d7e17ab`](https://github.com/apache/spark/commit/d7e17ab6ab7219394d08b205e892f383f7ca1641).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13294] [PROJECT INFRA] Remove MiMa's de...

2016-03-19 Thread nchammas

Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/11178#issuecomment-197623653
  
For some reason, this PR breaks the following invocation:

```
./dev/make-distribution.sh -T 1C -Phadoop-2.6
```

The problem appears to be with this line

```sh
SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version $@ 
2>/dev/null\
| grep -v "INFO"\
| tail -n 1)
```

which outputs this when run

```
+ SCALA_VERSION='[ERROR] Re-run Maven using the -X switch to enable full 
debug logging.'
```

Removing the `-T 1C` fixes it, for some reason.

Any ideas why this PR is interfering with the additional flags passed to 
Maven?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1811 matches

Mail list logo