[GitHub] spark pull request: [SPARK-12993][PYSPARK] Remove usage of ADD_FIL...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10913#issuecomment-174819214
  
**[Test build #50067 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50067/consoleFull)**
 for PR 10913 at commit 
[`f8c09de`](https://github.com/apache/spark/commit/f8c09de63aff3bcb220f5fa80926e83f4479c8b1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12994][CORE] It is not necessary to cre...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10914#issuecomment-174860449
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10086] [MLlib] [Streaming] [PySpark] ig...

2016-01-25 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/10909#issuecomment-174860765
  
Recent failures in the last 4 days:

* 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50016/testReport/
* 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49996/testReport/
* 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49989/testReport/
* 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49870/testReport/

Merged into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL][Minor] A few minor tweaks to CSV reader.

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10919#issuecomment-174865380
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50076/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

2016-01-25 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/10920

[SPARK-12937][SQL] bloom filter serialization

This PR adds serialization support for BloomFilter.

A version number is added to version the serialized binary format.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark bloom-filter

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10920.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10920


commit 4b05a35d58cdabccd915582894d303ba437bee0f
Author: Wenchen Fan 
Date:   2016-01-26T07:23:51Z

bloom filter serialization




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

2016-01-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10920#discussion_r50801787
  
--- Diff: 
common/sketch/src/main/java/org/apache/spark/util/sketch/Version.java ---
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util.sketch;
+
+/**
+ * Version number of the serialized binary format for bloom filter or 
count-min sketch.
+ */
+public enum Version {
--- End diff --

bloom filter and count-min sketch can have different version values, but we 
can share same version class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

2016-01-25 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/10920#issuecomment-174869956
  
cc @rxin @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12895][SPARK-12896] Migrate TaskMetrics...

2016-01-25 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10835#discussion_r50802605
  
--- Diff: 
core/src/test/scala/org/apache/spark/executor/TaskMetricsSuite.scala ---
@@ -17,12 +17,543 @@
 
 package org.apache.spark.executor
 
-import org.apache.spark.SparkFunSuite
+import org.scalatest.Assertions
+
+import org.apache.spark._
+import org.apache.spark.scheduler.AccumulableInfo
+import org.apache.spark.storage.{BlockId, BlockStatus, StorageLevel, 
TestBlockId}
+
 
 class TaskMetricsSuite extends SparkFunSuite {
-  test("[SPARK-5701] updateShuffleReadMetrics: ShuffleReadMetrics not 
added when no shuffle deps") {
-val taskMetrics = new TaskMetrics()
-taskMetrics.mergeShuffleReadMetrics()
-assert(taskMetrics.shuffleReadMetrics.isEmpty)
+  import AccumulatorParam._
+  import InternalAccumulator._
+  import StorageLevel._
+  import TaskMetricsSuite._
+
+  test("create") {
+val internalAccums = InternalAccumulator.create()
+val tm1 = new TaskMetrics
+val tm2 = new TaskMetrics(internalAccums)
+assert(tm1.accumulatorUpdates().size === internalAccums.size)
+assert(tm1.shuffleReadMetrics.isEmpty)
+assert(tm1.shuffleWriteMetrics.isEmpty)
+assert(tm1.inputMetrics.isEmpty)
+assert(tm1.outputMetrics.isEmpty)
+assert(tm2.accumulatorUpdates().size === internalAccums.size)
+assert(tm2.shuffleReadMetrics.isEmpty)
+assert(tm2.shuffleWriteMetrics.isEmpty)
+assert(tm2.inputMetrics.isEmpty)
+assert(tm2.outputMetrics.isEmpty)
+// TaskMetrics constructor expects minimal set of initial accumulators
+intercept[IllegalArgumentException] { new 
TaskMetrics(Seq.empty[Accumulator[_]]) }
+  }
+
+  test("create with unnamed accum") {
+intercept[IllegalArgumentException] {
+  new TaskMetrics(
+InternalAccumulator.create() ++ Seq(
+  new Accumulator(0, IntAccumulatorParam, None, internal = true)))
+}
+  }
+
+  test("create with duplicate name accum") {
+intercept[IllegalArgumentException] {
+  new TaskMetrics(
+InternalAccumulator.create() ++ Seq(
+  new Accumulator(0, IntAccumulatorParam, Some(RESULT_SIZE), 
internal = true)))
+}
+  }
+
+  test("create with external accum") {
+intercept[IllegalArgumentException] {
+  new TaskMetrics(
+InternalAccumulator.create() ++ Seq(
+  new Accumulator(0, IntAccumulatorParam, Some("x"
+}
+  }
+
+  test("create shuffle read metrics") {
+import shuffleRead._
+val accums = InternalAccumulator.createShuffleReadAccums()
+  .map { a => (a.name.get, a) }.toMap[String, Accumulator[_]]
+accums(REMOTE_BLOCKS_FETCHED).setValueAny(1)
+accums(LOCAL_BLOCKS_FETCHED).setValueAny(2)
+accums(REMOTE_BYTES_READ).setValueAny(3L)
+accums(LOCAL_BYTES_READ).setValueAny(4L)
+accums(FETCH_WAIT_TIME).setValueAny(5L)
+accums(RECORDS_READ).setValueAny(6L)
+val sr = new ShuffleReadMetrics(accums)
+assert(sr.remoteBlocksFetched === 1)
+assert(sr.localBlocksFetched === 2)
+assert(sr.remoteBytesRead === 3L)
+assert(sr.localBytesRead === 4L)
+assert(sr.fetchWaitTime === 5L)
+assert(sr.recordsRead === 6L)
+  }
+
+  test("create shuffle write metrics") {
+import shuffleWrite._
+val accums = InternalAccumulator.createShuffleWriteAccums()
+  .map { a => (a.name.get, a) }.toMap[String, Accumulator[_]]
+accums(BYTES_WRITTEN).setValueAny(1L)
+accums(RECORDS_WRITTEN).setValueAny(2L)
+accums(WRITE_TIME).setValueAny(3L)
+val sw = new ShuffleWriteMetrics(accums)
+assert(sw.bytesWritten === 1L)
+assert(sw.recordsWritten === 2L)
+assert(sw.writeTime === 3L)
+  }
+
+  test("create input metrics") {
+import input._
+val accums = InternalAccumulator.createInputAccums()
+  .map { a => (a.name.get, a) }.toMap[String, Accumulator[_]]
+accums(BYTES_READ).setValueAny(1L)
+accums(RECORDS_READ).setValueAny(2L)
+accums(READ_METHOD).setValueAny(DataReadMethod.Hadoop.toString)
+val im = new InputMetrics(accums)
+assert(im.bytesRead === 1L)
+assert(im.recordsRead === 2L)
+assert(im.readMethod === DataReadMethod.Hadoop)
+  }
+
+  test("create output metrics") {
+import output._
+val accums = InternalAccumulator.createOutputAccums()
+  .map { a => (a.name.get, a) }.toMap[String, Accumulator[_]]
+accums(BYTES_WRITTEN).setValueAny(1L)
+accums(RECORDS_WRITTEN).setValueAny(2L)
+

[GitHub] spark pull request: [SPARK-12994][CORE] It is not necessary to cre...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10914#issuecomment-174878692
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support

2016-01-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10762#discussion_r50803015
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -919,6 +919,7 @@ object PushPredicateThroughJoin extends 
Rule[LogicalPlan] with PredicateHelper {
   (rightFilterConditions ++ commonFilterCondition).
 reduceLeftOption(And).map(Filter(_, 
newJoin)).getOrElse(newJoin)
 case FullOuter => f // DO Nothing for Full Outer Join
+case NaturalJoin(_) => sys.error("Untransformed NaturalJoin node")
--- End diff --

Do we need to catch it? I think we can guarantee there is no `NaturalJoin` 
after `CheckAnalysis`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12993][PYSPARK] Remove usage of ADD_FIL...

2016-01-25 Thread zjffdu
GitHub user zjffdu opened a pull request:

https://github.com/apache/spark/pull/10913

[SPARK-12993][PYSPARK] Remove usage of ADD_FILES in pyspark



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zjffdu/spark SPARK-12993

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10913.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10913


commit f8c09de63aff3bcb220f5fa80926e83f4479c8b1
Author: Jeff Zhang 
Date:   2016-01-26T04:17:48Z

[SPARK-12993][PYSPARK] Remove usage of ADD_FILES in pyspark




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11780][SQL] Add catalyst type aliases b...

2016-01-25 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/spark/pull/10915

[SPARK-11780][SQL] Add catalyst type aliases backwards compatibility

Changed a target at branch-1.6 from #10635.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/spark pr9935-v3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10915.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10915


commit 9ef7185f5a9ce1f672559e00a34854c5afa4
Author: Takeshi YAMAMURO 
Date:   2016-01-26T05:15:47Z

Add catalyst type aliases backwards compatibility




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11780][SQL] Add catalyst type aliases b...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10915#issuecomment-174845577
  
**[Test build #50070 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50070/consoleFull)**
 for PR 10915 at commit 
[`9ef7185`](https://github.com/apache/spark/commit/9ef7185f5a9ce1f672559e00a34854c5afa4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12935][SQL] DataFrame API for Count-Min...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10911#issuecomment-174847369
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50061/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12834] Change ser/de of JavaArray and J...

2016-01-25 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/10772#issuecomment-174858967
  
LGTM
Merging with master
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11922][PYSPARK][ML] Python api for ml.f...

2016-01-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10085


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12865][SPARK-12866][SQL] Migrate SparkS...

2016-01-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/10905#discussion_r50800538
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ASTNode.scala 
---
@@ -60,6 +60,12 @@ case class ASTNode(
   /** Source text. */
   lazy val source = stream.toString(startIndex, stopIndex)
 
+  /** Get the source text that remains after this token. */
+  lazy val remainder = {
--- End diff --

if you are updating the pr, can you add explicit types for all the public 
vals?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12854][SQL] Implement complex types sup...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10820#issuecomment-174867931
  
**[Test build #50080 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50080/consoleFull)**
 for PR 10820 at commit 
[`f378335`](https://github.com/apache/spark/commit/f378335858c1c10400936f046430f8e7f4c70c3c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12935][SQL] DataFrame API for Count-Min...

2016-01-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10911#discussion_r50802122
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala ---
@@ -309,4 +311,84 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
   def sampleBy[T](col: String, fractions: ju.Map[T, jl.Double], seed: 
Long): DataFrame = {
 sampleBy(col, fractions.asScala.toMap.asInstanceOf[Map[T, Double]], 
seed)
   }
+
+  /**
+   * Builds a Count-min Sketch over a specified column.
+   *
+   * @param colName name of the column over which the sketch is built
+   * @param depth depth of the sketch
+   * @param width width of the sketch
+   * @param seed random seed
+   * @return a [[CountMinSketch]] over column `colName`
+   * @since 2.0.0
+   */
+  def countMinSketch(colName: String, depth: Int, width: Int, seed: Int): 
CountMinSketch = {
+countMinSketch(Column(colName), depth, width, seed)
+  }
+
+  /**
+   * Builds a Count-min Sketch over a specified column.
+   *
+   * @param colName name of the column over which the sketch is built
+   * @param eps relative error of the sketch
+   * @param confidence confidence of the sketch
+   * @param seed random seed
+   * @return a [[CountMinSketch]] over column `colName`
+   * @since 2.0.0
+   */
+  def countMinSketch(
+  colName: String, eps: Double, confidence: Double, seed: Int): 
CountMinSketch = {
+countMinSketch(Column(colName), eps, confidence, seed)
+  }
+
+  /**
+   * Builds a Count-min Sketch over a specified column.
+   *
+   * @param col the column over which the sketch is built
+   * @param depth depth of the sketch
+   * @param width width of the sketch
+   * @param seed random seed
+   * @return a [[CountMinSketch]] over column `colName`
+   * @since 2.0.0
+   */
+  def countMinSketch(col: Column, depth: Int, width: Int, seed: Int): 
CountMinSketch = {
+countMinSketch(col, CountMinSketch.create(depth, width, seed))
+  }
+
+  /**
+   * Builds a Count-min Sketch over a specified column.
+   *
+   * @param col the column over which the sketch is built
+   * @param eps relative error of the sketch
+   * @param confidence confidence of the sketch
+   * @param seed random seed
+   * @return a [[CountMinSketch]] over column `colName`
+   * @since 2.0.0
+   */
+  def countMinSketch(col: Column, eps: Double, confidence: Double, seed: 
Int): CountMinSketch = {
+countMinSketch(col, CountMinSketch.create(eps, confidence, seed))
+  }
+
+  private def countMinSketch(col: Column, zero: CountMinSketch): 
CountMinSketch = {
+val singleCol = df.select(col)
+val colType = singleCol.schema.head.dataType
+val supportedTypes: Set[DataType] = Set(ByteType, ShortType, 
IntegerType, LongType, StringType)
+
+require(
+  supportedTypes.contains(colType),
+  s"Count-min Sketch only supports string type and integral types, " +
+s"and does not support type $colType."
+)
+
+singleCol.rdd.aggregate(zero)(
--- End diff --

Maybe we can improve it by UDAF in the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12926][SQL] SQLContext to disallow user...

2016-01-25 Thread tejasapatil
Github user tejasapatil commented on the pull request:

https://github.com/apache/spark/pull/10849#issuecomment-174873141
  
Fixed scala style test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12895][SPARK-12896] Migrate TaskMetrics...

2016-01-25 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10835#discussion_r50802114
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/api/v1/AllStagesResource.scala ---
@@ -237,7 +237,8 @@ private[v1] object AllStagesResource {
   }
 
   def convertAccumulableInfo(acc: InternalAccumulableInfo): 
AccumulableInfo = {
-new AccumulableInfo(acc.id, acc.name, acc.update, acc.value)
+new AccumulableInfo(
+  acc.id, acc.name, acc.update.map(_.toString), 
acc.value.map(_.toString).orNull)
--- End diff --

This was kind of confusing on first glance until I rembered that we have 
the weird UI AccumulableInfo and the other version which is used elsewhere and 
which has been renamed to `InternalAccumulableInfo` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12895][SPARK-12896] Migrate TaskMetrics...

2016-01-25 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10835#discussion_r50802781
  
--- Diff: core/src/main/scala/org/apache/spark/InternalAccumulator.scala ---
@@ -17,42 +17,193 @@
 
 package org.apache.spark
 
+import org.apache.spark.storage.{BlockId, BlockStatus}
 
-// This is moved to its own file because many more things will be added to 
it in SPARK-10620.
+
+/**
+ * A collection of fields and methods concerned with internal accumulators 
that represent
+ * task level metrics.
+ */
 private[spark] object InternalAccumulator {
-  val PEAK_EXECUTION_MEMORY = "peakExecutionMemory"
-  val TEST_ACCUMULATOR = "testAccumulator"
-
-  // For testing only.
-  // This needs to be a def since we don't want to reuse the same 
accumulator across stages.
-  private def maybeTestAccumulator: Option[Accumulator[Long]] = {
-if (sys.props.contains("spark.testing")) {
-  Some(new Accumulator(
-0L, AccumulatorParam.LongAccumulatorParam, Some(TEST_ACCUMULATOR), 
internal = true))
-} else {
-  None
+
+  import AccumulatorParam._
+
+  // Prefixes used in names of internal task level metrics
+  val METRICS_PREFIX = "internal.metrics."
+  val SHUFFLE_READ_METRICS_PREFIX = METRICS_PREFIX + "shuffle.read."
+  val SHUFFLE_WRITE_METRICS_PREFIX = METRICS_PREFIX + "shuffle.write."
+  val OUTPUT_METRICS_PREFIX = METRICS_PREFIX + "output."
+  val INPUT_METRICS_PREFIX = METRICS_PREFIX + "input."
+
+  // Names of internal task level metrics
+  val EXECUTOR_DESERIALIZE_TIME = METRICS_PREFIX + 
"executorDeserializeTime"
+  val EXECUTOR_RUN_TIME = METRICS_PREFIX + "executorRunTime"
+  val RESULT_SIZE = METRICS_PREFIX + "resultSize"
+  val JVM_GC_TIME = METRICS_PREFIX + "jvmGCTime"
+  val RESULT_SERIALIZATION_TIME = METRICS_PREFIX + 
"resultSerializationTime"
+  val MEMORY_BYTES_SPILLED = METRICS_PREFIX + "memoryBytesSpilled"
+  val DISK_BYTES_SPILLED = METRICS_PREFIX + "diskBytesSpilled"
+  val PEAK_EXECUTION_MEMORY = METRICS_PREFIX + "peakExecutionMemory"
+  val UPDATED_BLOCK_STATUSES = METRICS_PREFIX + "updatedBlockStatuses"
+  val TEST_ACCUM = METRICS_PREFIX + "testAccumulator"
+
+  // scalastyle:off
+
+  // Names of shuffle read metrics
+  object shuffleRead {
+val REMOTE_BLOCKS_FETCHED = SHUFFLE_READ_METRICS_PREFIX + 
"remoteBlocksFetched"
+val LOCAL_BLOCKS_FETCHED = SHUFFLE_READ_METRICS_PREFIX + 
"localBlocksFetched"
+val REMOTE_BYTES_READ = SHUFFLE_READ_METRICS_PREFIX + "remoteBytesRead"
+val LOCAL_BYTES_READ = SHUFFLE_READ_METRICS_PREFIX + "localBytesRead"
+val FETCH_WAIT_TIME = SHUFFLE_READ_METRICS_PREFIX + "fetchWaitTime"
+val RECORDS_READ = SHUFFLE_READ_METRICS_PREFIX + "recordsRead"
+  }
+
+  // Names of shuffle write metrics
+  object shuffleWrite {
+val BYTES_WRITTEN = SHUFFLE_WRITE_METRICS_PREFIX + "bytesWritten"
+val RECORDS_WRITTEN = SHUFFLE_WRITE_METRICS_PREFIX + "recordsWritten"
+val WRITE_TIME = SHUFFLE_WRITE_METRICS_PREFIX + "writeTime"
+  }
+
+  // Names of output metrics
+  object output {
+val WRITE_METHOD = OUTPUT_METRICS_PREFIX + "writeMethod"
+val BYTES_WRITTEN = OUTPUT_METRICS_PREFIX + "bytesWritten"
+val RECORDS_WRITTEN = OUTPUT_METRICS_PREFIX + "recordsWritten"
+  }
+
+  // Names of input metrics
+  object input {
+val READ_METHOD = INPUT_METRICS_PREFIX + "readMethod"
+val BYTES_READ = INPUT_METRICS_PREFIX + "bytesRead"
+val RECORDS_READ = INPUT_METRICS_PREFIX + "recordsRead"
+  }
+
+  // scalastyle:on
+
+  /**
+   * Create an internal [[Accumulator]] by name, which must begin with 
[[METRICS_PREFIX]].
+   */
+  def create(name: String): Accumulator[_] = {
+assert(name.startsWith(METRICS_PREFIX),
+  s"internal accumulator name must start with '$METRICS_PREFIX': 
$name")
+getParam(name) match {
+  case p @ LongAccumulatorParam => newMetric[Long](0L, name, p)
+  case p @ IntAccumulatorParam => newMetric[Int](0, name, p)
+  case p @ StringAccumulatorParam => newMetric[String]("", name, p)
+  case p @ UpdatedBlockStatusesAccumulatorParam =>
+newMetric[Seq[(BlockId, BlockStatus)]](Seq(), name, p)
+  case p => throw new IllegalArgumentException(
+s"unsupported accumulator param '${p.getClass.getSimpleName}' for 
metric '$name'.")
+}
+  }
+
+  /**
+   * Get the [[AccumulatorParam]] associated with the internal metric name,
+   * which must begin with [[METRICS_PREFIX]].
+   */
+  def getParam(name: String): AccumulatorParam[_] = {
+assert(name.startsWith(METRICS_PREFIX),
+  s"internal accumulator name must 

[GitHub] spark pull request: [SPARK-12968][SQL] Implement command to set cu...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10916#issuecomment-174881926
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50079/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11775][PYSPARK][SQL] Allow PySpark to r...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9766#issuecomment-174817125
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11775][PYSPARK][SQL] Allow PySpark to r...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9766#issuecomment-174817117
  
**[Test build #50066 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50066/consoleFull)**
 for PR 9766 at commit 
[`2e17865`](https://github.com/apache/spark/commit/2e178651b4f4e9c44f1cbdcba821492ebd48ebc1).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11775][PYSPARK][SQL] Allow PySpark to r...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9766#issuecomment-174817126
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50066/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12401][SQL] Add integration tests for p...

2016-01-25 Thread maropu
Github user maropu commented on the pull request:

https://github.com/apache/spark/pull/10596#issuecomment-174823978
  
@liancheng Okay and ready to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11775][PYSPARK][SQL] Allow PySpark to r...

2016-01-25 Thread zjffdu
Github user zjffdu commented on the pull request:

https://github.com/apache/spark/pull/9766#issuecomment-174832497
  
please test it again. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12994][CORE] It is not necessary to cre...

2016-01-25 Thread zjffdu
Github user zjffdu commented on the pull request:

https://github.com/apache/spark/pull/10914#issuecomment-174852141
  
Thanks @jerryshao 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12834] Change ser/de of JavaArray and J...

2016-01-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10772


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12995][GraphX] Remove deprecate APIs fr...

2016-01-25 Thread maropu
Github user maropu commented on the pull request:

https://github.com/apache/spark/pull/10918#issuecomment-174859672
  
@srowen This is an activity from the discussion in #4402.
I checked that GraphX has deprecate APIs used only in Pregel and this pr 
removes them.
If there aren't any problems, I'll also remove deprecate ones from the 
test codes in GraphX.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12983] [CORE] [DOC] Correct metrics.pro...

2016-01-25 Thread BenFradet
Github user BenFradet commented on the pull request:

https://github.com/apache/spark/pull/10902#issuecomment-174859378
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

2016-01-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/10920#discussion_r50801991
  
--- Diff: 
common/sketch/src/main/java/org/apache/spark/util/sketch/BitArray.java ---
@@ -32,13 +38,14 @@ static int numWords(long numBits) {
   }
 
   BitArray(long numBits) {
-if (numBits <= 0) {
-  throw new IllegalArgumentException("numBits must be positive");
-}
-this.data = new long[numWords(numBits)];
+this(new long[numWords(numBits)]);
+  }
+
+  private BitArray(long[] data) {
+this.data = data;
 long bitCount = 0;
-for (long value : data) {
-  bitCount += Long.bitCount(value);
+for (long datum : data) {
--- End diff --

it is a little bit weird to say datam here, since you are actually working 
with 64 "datum" at once. maybe "word"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12935][SQL] DataFrame API for Count-Min...

2016-01-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10911#discussion_r50801910
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala ---
@@ -309,4 +311,84 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
   def sampleBy[T](col: String, fractions: ju.Map[T, jl.Double], seed: 
Long): DataFrame = {
 sampleBy(col, fractions.asScala.toMap.asInstanceOf[Map[T, Double]], 
seed)
   }
+
+  /**
+   * Builds a Count-min Sketch over a specified column.
+   *
+   * @param colName name of the column over which the sketch is built
+   * @param depth depth of the sketch
+   * @param width width of the sketch
+   * @param seed random seed
+   * @return a [[CountMinSketch]] over column `colName`
+   * @since 2.0.0
+   */
+  def countMinSketch(colName: String, depth: Int, width: Int, seed: Int): 
CountMinSketch = {
+countMinSketch(Column(colName), depth, width, seed)
+  }
+
+  /**
+   * Builds a Count-min Sketch over a specified column.
+   *
+   * @param colName name of the column over which the sketch is built
+   * @param eps relative error of the sketch
+   * @param confidence confidence of the sketch
+   * @param seed random seed
+   * @return a [[CountMinSketch]] over column `colName`
+   * @since 2.0.0
+   */
+  def countMinSketch(
+  colName: String, eps: Double, confidence: Double, seed: Int): 
CountMinSketch = {
+countMinSketch(Column(colName), eps, confidence, seed)
+  }
+
+  /**
+   * Builds a Count-min Sketch over a specified column.
+   *
+   * @param col the column over which the sketch is built
+   * @param depth depth of the sketch
+   * @param width width of the sketch
+   * @param seed random seed
+   * @return a [[CountMinSketch]] over column `colName`
+   * @since 2.0.0
+   */
+  def countMinSketch(col: Column, depth: Int, width: Int, seed: Int): 
CountMinSketch = {
+countMinSketch(col, CountMinSketch.create(depth, width, seed))
+  }
+
+  /**
+   * Builds a Count-min Sketch over a specified column.
+   *
+   * @param col the column over which the sketch is built
+   * @param eps relative error of the sketch
+   * @param confidence confidence of the sketch
+   * @param seed random seed
+   * @return a [[CountMinSketch]] over column `colName`
+   * @since 2.0.0
+   */
+  def countMinSketch(col: Column, eps: Double, confidence: Double, seed: 
Int): CountMinSketch = {
+countMinSketch(col, CountMinSketch.create(eps, confidence, seed))
+  }
+
+  private def countMinSketch(col: Column, zero: CountMinSketch): 
CountMinSketch = {
+val singleCol = df.select(col)
+val colType = singleCol.schema.head.dataType
+val supportedTypes: Set[DataType] = Set(ByteType, ShortType, 
IntegerType, LongType, StringType)
+
+require(
+  supportedTypes.contains(colType),
--- End diff --

how about `colType == StringType || colType.isInstanceOf[IntegralType]`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

2016-01-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/10920#discussion_r50802030
  
--- Diff: 
common/sketch/src/main/java/org/apache/spark/util/sketch/BloomFilter.java ---
@@ -83,7 +87,7 @@
* bloom filters are appropriately sized to avoid saturating them.
*
* @param other The bloom filter to combine this bloom filter with. It 
is not mutated.
-   * @throws IllegalArgumentException if {@code isCompatible(that) == 
false}
+   * @throws IncompatibleMergeException if {@code isCompatible(that) == 
false}
--- End diff --

you are using "other" instead of "that" here. make them consistent


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10920#issuecomment-174881467
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50081/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support

2016-01-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10762#discussion_r50803405
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
@@ -474,6 +474,7 @@ class DataFrame private[sql](
   val rightCol = 
withPlan(joined.right).resolve(col).toAttribute.withNullability(true)
   Alias(Coalesce(Seq(leftCol, rightCol)), col)()
 }
+  case NaturalJoin(_) => sys.error("NaturalJoin with using clause is 
not supported.")
--- End diff --

Then this case is unreachable as `JoinType.apply` won't produce natural 
join.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12935][SQL] DataFrame API for Count-Min...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10911#issuecomment-174816049
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50055/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11775][PYSPARK][SQL] Allow PySpark to r...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9766#issuecomment-174816148
  
**[Test build #50066 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50066/consoleFull)**
 for PR 9766 at commit 
[`2e17865`](https://github.com/apache/spark/commit/2e178651b4f4e9c44f1cbdcba821492ebd48ebc1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12935][SQL] DataFrame API for Count-Min...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10911#issuecomment-174816044
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-529] [core] [yarn] Add type-safe config...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10205#issuecomment-174821201
  
**[Test build #50068 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50068/consoleFull)**
 for PR 10205 at commit 
[`d125e03`](https://github.com/apache/spark/commit/d125e03362d298a08a242ee52a20910e95dfaaa0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-529] [core] [yarn] Add type-safe config...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10205#issuecomment-174821494
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-529] [core] [yarn] Add type-safe config...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10205#issuecomment-174821491
  
**[Test build #50068 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50068/consoleFull)**
 for PR 10205 at commit 
[`d125e03`](https://github.com/apache/spark/commit/d125e03362d298a08a242ee52a20910e95dfaaa0).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-529] [core] [yarn] Add type-safe config...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10205#issuecomment-174821495
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50068/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12993][PYSPARK] Remove usage of ADD_FIL...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10913#issuecomment-174826905
  
**[Test build #50067 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50067/consoleFull)**
 for PR 10913 at commit 
[`f8c09de`](https://github.com/apache/spark/commit/f8c09de63aff3bcb220f5fa80926e83f4479c8b1).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12993][PYSPARK] Remove usage of ADD_FIL...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10913#issuecomment-174827000
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50067/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12993][PYSPARK] Remove usage of ADD_FIL...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10913#issuecomment-174826995
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12895][SPARK-12896] Migrate TaskMetrics...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10835#issuecomment-174828808
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50064/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12895][SPARK-12896] Migrate TaskMetrics...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10835#issuecomment-174828804
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-174849282
  
**[Test build #50072 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50072/consoleFull)**
 for PR 10917 at commit 
[`8207dc1`](https://github.com/apache/spark/commit/8207dc109f21527438cbd80894e9b49d63159f12).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12995][GraphX] Remove deprecate APIs fr...

2016-01-25 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/spark/pull/10918

[SPARK-12995][GraphX] Remove deprecate APIs from Pregel



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/spark RemoveDeprecateInPregel

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10918.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10918


commit fea631129df389b97f5695c11a6bb0c1fef0fb0c
Author: Takeshi YAMAMURO 
Date:   2016-01-26T06:21:17Z

Remove deprecate APIs from Pregel




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12994][CORE] It is not necessary to cre...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10914#issuecomment-174860248
  
**[Test build #50069 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50069/consoleFull)**
 for PR 10914 at commit 
[`90118ca`](https://github.com/apache/spark/commit/90118ca76c2cbe381bc06614c02cd3b089951c10).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12995][GraphX] Remove deprecate APIs fr...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10918#issuecomment-174860265
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12994][CORE] It is not necessary to cre...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10914#issuecomment-174860451
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50069/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12995][GraphX] Remove deprecate APIs fr...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10918#issuecomment-174860277
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50074/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9835] [ML] Implement IterativelyReweigh...

2016-01-25 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/10639#discussion_r50801341
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/optim/IterativelyReweightedLeastSquares.scala
 ---
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.optim
+
+import org.apache.spark.Logging
+import org.apache.spark.ml.feature.Instance
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.rdd.RDD
+
+/**
+ * Model fitted by [[IterativelyReweightedLeastSquares]].
+ * @param coefficients model coefficients
+ * @param intercept model intercept
+ */
+private[ml] class IterativelyReweightedLeastSquaresModel(
+val coefficients: DenseVector,
+val intercept: Double) extends Serializable
+
+/**
+ * Implements the method of iteratively reweighted least squares (IRLS) 
which is used to solve
+ * certain optimization problems by an iterative method. In each step of 
the iterations, it
+ * involves solving a weighted lease squares (WLS) problem by 
[[WeightedLeastSquares]].
+ * It can be used to find maximum likelihood estimates of a generalized 
linear model (GLM),
+ * find M-estimator in robust regression and other optimization problems.
--- End diff --

It would be good to provide a reference about IRLS. The IRLS page on 
Wikipedia is specialized for Lp regression. I would recommend Green's paper as 
a reference: http://www.jstor.org/stable/2345503


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12937][SQL] bloom filter serialization

2016-01-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/10920#discussion_r50802349
  
--- Diff: 
common/sketch/src/main/java/org/apache/spark/util/sketch/BloomFilterImpl.java 
---
@@ -161,4 +194,24 @@ public BloomFilter mergeInPlace(BloomFilter other) 
throws IncompatibleMergeExcep
 this.bits.putAll(that.bits);
 return this;
   }
+
+  @Override
+  public void writeTo(OutputStream out) throws IOException {
+DataOutputStream dos = new DataOutputStream(out);
+
+dos.writeInt(Version.V1.getVersionNumber());
+bits.writeTo(dos);
+dos.writeInt(numHashFunctions);
+  }
+
+  public static BloomFilterImpl readFrom(InputStream in) throws 
IOException {
+DataInputStream dis = new DataInputStream(in);
+
+int version = dis.readInt();
+if (version != Version.V1.getVersionNumber()) {
+  throw new IOException("Unexpected Bloom Filter version number (" + 
version + ")");
--- End diff --

BloomFilter, or Bloom filter


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-174874774
  
**[Test build #50072 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50072/consoleFull)**
 for PR 10917 at commit 
[`8207dc1`](https://github.com/apache/spark/commit/8207dc109f21527438cbd80894e9b49d63159f12).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-01-25 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-174874622
  
@nongli  It's not doing anything to get the hash code of int field, but do 
a [simple multiplication and 
addition](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala#L153)
 to get the hash code of the row.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12968][SQL] Implement command to set cu...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10916#issuecomment-174881888
  
**[Test build #50079 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50079/consoleFull)**
 for PR 10916 at commit 
[`43beb4b`](https://github.com/apache/spark/commit/43beb4ba499814c698df7537018ab6fafefa738e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12968][SQL] Implement command to set cu...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10916#issuecomment-174881924
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support

2016-01-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10762#discussion_r50803483
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1159,6 +1161,25 @@ class Analyzer(
   }
 }
   }
+
+  /**
+   * Removes natural joins.
--- End diff --

I think we need more comments here, how we resolve a natural join to normal 
join?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support

2016-01-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/10762#discussion_r50803540
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
@@ -474,6 +474,7 @@ class DataFrame private[sql](
   val rightCol = 
withPlan(joined.right).resolve(col).toAttribute.withNullability(true)
   Alias(Coalesce(Seq(leftCol, rightCol)), col)()
 }
+  case NaturalJoin(_) => sys.error("NaturalJoin with using clause is 
not supported.")
--- End diff --

yup - although we should still throw some exception here just in case we 
refactor code in the future so this is reachable.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8171] [Web UI] Simulated infinite scrol...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10910#issuecomment-174801261
  
**[Test build #50065 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50065/consoleFull)**
 for PR 10910 at commit 
[`4d7c433`](https://github.com/apache/spark/commit/4d7c43373d43126b488540d7659274277665f51c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12994][CORE] It is not necessary to cre...

2016-01-25 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/10914#discussion_r50796394
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -537,10 +537,11 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
 }
 
 _executorAllocationManager =
-  if (dynamicAllocationEnabled) {
+  if (dynamicAllocationEnabled && !isLocal) {
 Some(new ExecutorAllocationManager(this, listenerBus, _conf))
   } else {
 None
+
--- End diff --

Remove this empty line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12994][CORE] It is not necessary to cre...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10914#issuecomment-174837394
  
**[Test build #50069 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50069/consoleFull)**
 for PR 10914 at commit 
[`90118ca`](https://github.com/apache/spark/commit/90118ca76c2cbe381bc06614c02cd3b089951c10).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12968][SQL] Implement command to set cu...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10916#issuecomment-174848191
  
**[Test build #50071 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50071/consoleFull)**
 for PR 10916 at commit 
[`46737b5`](https://github.com/apache/spark/commit/46737b5c9fecbc68b1e4e830b2a1b189a2e72158).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12968][SQL] Implement command to set cu...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10916#issuecomment-174855435
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50071/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12968][SQL] Implement command to set cu...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10916#issuecomment-174855251
  
**[Test build #50071 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50071/consoleFull)**
 for PR 10916 at commit 
[`46737b5`](https://github.com/apache/spark/commit/46737b5c9fecbc68b1e4e830b2a1b189a2e72158).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class SetDatabaseCommand(databaseName: String) extends 
RunnableCommand `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11622][MLLIB] Make LibSVMRelation exten...

2016-01-25 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/9595#issuecomment-174861333
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-01-25 Thread nongli
Github user nongli commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-174866182
  
@cloud-fan Simple is just a single int right? It's not even doing anything 
in the previous case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11780][SQL] Add catalyst type aliases b...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10915#issuecomment-174866516
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11780][SQL] Add catalyst type aliases b...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10915#issuecomment-174866518
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50070/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11780][SQL] Add catalyst type aliases b...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10915#issuecomment-174866357
  
**[Test build #50070 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50070/consoleFull)**
 for PR 10915 at commit 
[`9ef7185`](https://github.com/apache/spark/commit/9ef7185f5a9ce1f672559e00a34854c5afa4).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12926][SQL] SQLContext to disallow user...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10849#issuecomment-174874233
  
**[Test build #50082 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50082/consoleFull)**
 for PR 10849 at commit 
[`f982d54`](https://github.com/apache/spark/commit/f982d5449fc52ef9b844761f92306fb7d238b542).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12895][SPARK-12896] Migrate TaskMetrics...

2016-01-25 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/10835#discussion_r50802278
  
--- Diff: 
core/src/test/scala/org/apache/spark/executor/TaskMetricsSuite.scala ---
@@ -17,12 +17,345 @@
 
 package org.apache.spark.executor
 
-import org.apache.spark.SparkFunSuite
+import org.apache.spark._
+import org.apache.spark.storage.{BlockId, BlockStatus, StorageLevel, 
TestBlockId}
+
 
 class TaskMetricsSuite extends SparkFunSuite {
-  test("[SPARK-5701] updateShuffleReadMetrics: ShuffleReadMetrics not 
added when no shuffle deps") {
-val taskMetrics = new TaskMetrics()
-taskMetrics.mergeShuffleReadMetrics()
-assert(taskMetrics.shuffleReadMetrics.isEmpty)
+  import AccumulatorParam._
+  import InternalAccumulator._
+  import StorageLevel._
+  import TaskMetricsSuite._
+
+  test("create") {
--- End diff --

Cool, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support

2016-01-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10762#discussion_r50803191
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
@@ -474,6 +474,7 @@ class DataFrame private[sql](
   val rightCol = 
withPlan(joined.right).resolve(col).toAttribute.withNullability(true)
   Alias(Coalesce(Seq(leftCol, rightCol)), col)()
 }
+  case NaturalJoin(_) => sys.error("NaturalJoin with using clause is 
not supported.")
--- End diff --

Are we going to support natural join in `DataFrame`? If so, I think we 
should also change `JoinType.apply`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12895][SPARK-12896] Migrate TaskMetrics...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10835#issuecomment-174828051
  
**[Test build #50064 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50064/consoleFull)**
 for PR 10835 at commit 
[`7e7c2f4`](https://github.com/apache/spark/commit/7e7c2f41f8d8cd302a89cc1ef15b552fb5e28e2d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8171] [Web UI] Simulated infinite scrol...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10910#issuecomment-174837910
  
**[Test build #50065 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50065/consoleFull)**
 for PR 10910 at commit 
[`4d7c433`](https://github.com/apache/spark/commit/4d7c43373d43126b488540d7659274277665f51c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8171] [Web UI] Simulated infinite scrol...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10910#issuecomment-174838126
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50065/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12968][SQL] Implement command to set cu...

2016-01-25 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/10916

[SPARK-12968][SQL] Implement command to set current database

JIRA: https://issues.apache.org/jira/browse/SPARK-12968

Implement command to set current database.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 ddl-use-database

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10916.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10916


commit 46737b5c9fecbc68b1e4e830b2a1b189a2e72158
Author: Liang-Chi Hsieh 
Date:   2016-01-26T05:33:13Z

Implement command to set current database.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-01-25 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/10917#issuecomment-174846863
  
cc @nongli @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12888][SQL][follow-up] benchmark the ne...

2016-01-25 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/10917

[SPARK-12888][SQL][follow-up] benchmark the new hash expression

Adds the benchmark results as comments.

The codegen version is slower than the interpreted version for `simple` 
case becasue of 3 reasons:

1. codegen version use a more complex hash algorithm than interpreted 
version, i.e. `Murmur3_x86_32.hashInt` vs [simple multiplication and 
addition](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala#L153).
2. codegen version will write the hash value to a row first and then read 
it out. I tried to create a `GenerateHasher` that can generate code to return 
hash value directly and got about 60% speed up for the `simple` case, does it 
worth?
3. the row in `simple` case only has one int field, so the runtime 
reflection may be removed because of branch prediction, which makes the 
interpreted version faster.

The `array` case is also slow for similar reasons, e.g. array elements are 
of same type, so interpreted version can probably get rid of runtime reflection 
by branch prediction.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark hash-benchmark

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10917.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10917


commit 8207dc109f21527438cbd80894e9b49d63159f12
Author: Wenchen Fan 
Date:   2016-01-26T02:24:38Z

add benchmark results




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12935][SQL] DataFrame API for Count-Min...

2016-01-25 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10911#issuecomment-174848875
  
cc @JoshRosen is the python tests broken?

```
Running PySpark tests. Output is in 
/home/jenkins/workspace/SparkPullRequestBuilder/python/unit-tests.log
Error: unrecognized module 'root'. Supported modules: pyspark-mllib, 
pyspark-core, pyspark-ml, pyspark-sql, pyspark-streaming
[error] running 
/home/jenkins/workspace/SparkPullRequestBuilder/python/run-tests 
--modules=pyspark-mllib,pyspark-ml,pyspark-sql,root --parallelism=4 ; received 
return code 255
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11922][PYSPARK][ML] Python api for ml.f...

2016-01-25 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/10085#issuecomment-174857126
  
LGTM
Merging with master
Thanks for the PR!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12968][SQL] Implement command to set cu...

2016-01-25 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/10916#issuecomment-174861508
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10086] [MLlib] [Streaming] [PySpark] ig...

2016-01-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10909


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12994][CORE] It is not necessary to cre...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10914#issuecomment-174878554
  
**[Test build #50073 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50073/consoleFull)**
 for PR 10914 at commit 
[`0467617`](https://github.com/apache/spark/commit/0467617746590b3083deafaa763ee4cae50d4dc0).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12994][CORE] It is not necessary to cre...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10914#issuecomment-174878693
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50073/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12401][SQL] Add integration tests for p...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10596#issuecomment-174811227
  
**[Test build #50062 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50062/consoleFull)**
 for PR 10596 at commit 
[`dbc6829`](https://github.com/apache/spark/commit/dbc6829ca8584009972826e48864ba416ded6479).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12994][CORE] It is not necessary to cre...

2016-01-25 Thread zjffdu
GitHub user zjffdu opened a pull request:

https://github.com/apache/spark/pull/10914

[SPARK-12994][CORE] It is not necessary to create ExecutorAllocationM…

…anager in local mode

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zjffdu/spark SPARK-12994

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10914.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10914


commit 90118ca76c2cbe381bc06614c02cd3b089951c10
Author: Jeff Zhang 
Date:   2016-01-26T05:02:27Z

[SPARK-12994][CORE] It is not necessary to create ExecutorAllocationManager 
in local mode




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...

2016-01-25 Thread zhonghaihua
Github user zhonghaihua commented on the pull request:

https://github.com/apache/spark/pull/10794#issuecomment-174830247
  
@marmbrus @liancheng @yhuai Could you verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8171] [Web UI] Simulated infinite scrol...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10910#issuecomment-174838124
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12993][PYSPARK] Remove usage of ADD_FIL...

2016-01-25 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10913#issuecomment-174849947
  
Can you update the pull request description to describe why we are removing 
this? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12968][SQL] Implement command to set cu...

2016-01-25 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10916#issuecomment-174849740
  
cc @hvanhovell for review.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12994][CORE] It is not necessary to cre...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10914#issuecomment-174852807
  
**[Test build #50073 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50073/consoleFull)**
 for PR 10914 at commit 
[`0467617`](https://github.com/apache/spark/commit/0467617746590b3083deafaa763ee4cae50d4dc0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12968][SQL] Implement command to set cu...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10916#issuecomment-174866893
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9835] [ML] Implement IterativelyReweigh...

2016-01-25 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/10639#discussion_r50801095
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/optim/IterativelyReweightedLeastSquares.scala
 ---
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.optim
+
+import org.apache.spark.Logging
+import org.apache.spark.ml.feature.Instance
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.rdd.RDD
+
+/**
+ * Model fitted by [[IterativelyReweightedLeastSquares]].
+ * @param coefficients model coefficients
+ * @param intercept model intercept
+ */
+private[ml] class IterativelyReweightedLeastSquaresModel(
+val coefficients: DenseVector,
+val intercept: Double) extends Serializable
+
+/**
+ * Implements the method of iteratively reweighted least squares (IRLS) 
which is used to solve
+ * certain optimization problems by an iterative method. In each step of 
the iterations, it
+ * involves solving a weighted lease squares (WLS) problem by 
[[WeightedLeastSquares]].
+ * It can be used to find maximum likelihood estimates of a generalized 
linear model (GLM),
+ * find M-estimator in robust regression and other optimization problems.
+ *
+ * @param initialModel the initial guess model.
+ * @param reweightFunc the reweight function which is used to update 
offsets and weights
+ * at each iteration.
+ * @param fitIntercept whether to fit intercept.
+ * @param regParam L2 regularization parameter used by WLS.
+ * @param maxIter maximum number of iterations.
+ * @param tol the convergence tolerance.
+ */
+private[ml] class IterativelyReweightedLeastSquares(
+val initialModel: WeightedLeastSquaresModel,
+val reweightFunc: (Instance, WeightedLeastSquaresModel) => (Double, 
Double),
+val fitIntercept: Boolean,
+val regParam: Double,
+val maxIter: Int,
+val tol: Double) extends Logging with Serializable {
+
+  def fit(instances: RDD[Instance]): 
IterativelyReweightedLeastSquaresModel = {
+
+var converged = false
+var iter = 0
+
+var offsetsAndWeights: RDD[(Double, Double)] = null
+var model: WeightedLeastSquaresModel = initialModel
+var oldModel: WeightedLeastSquaresModel = initialModel
+
+while (iter < maxIter && !converged) {
+
+  oldModel = model
+
+  // Update offsets and weights using reweightFunc
+  offsetsAndWeights = instances.map { instance => 
reweightFunc(instance, oldModel) }
+
+  // Estimate new model
+  val newInstances = instances.zip(offsetsAndWeights).map {
--- End diff --

`zip` is not efficient. Generate `newInstances` directly:

~~~scala
val newInstances = instances.map { instance =>
  val (newOffset, newWeight) = reweightFunc(instance, oldModel)
  Instance(newOffset, newWeight, instance.features)
}
~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9835] [ML] Implement IterativelyReweigh...

2016-01-25 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/10639#discussion_r50801093
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/optim/IterativelyReweightedLeastSquares.scala
 ---
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.optim
+
+import org.apache.spark.Logging
+import org.apache.spark.ml.feature.Instance
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.rdd.RDD
+
+/**
+ * Model fitted by [[IterativelyReweightedLeastSquares]].
+ * @param coefficients model coefficients
+ * @param intercept model intercept
+ */
+private[ml] class IterativelyReweightedLeastSquaresModel(
+val coefficients: DenseVector,
+val intercept: Double) extends Serializable
+
+/**
+ * Implements the method of iteratively reweighted least squares (IRLS) 
which is used to solve
+ * certain optimization problems by an iterative method. In each step of 
the iterations, it
+ * involves solving a weighted lease squares (WLS) problem by 
[[WeightedLeastSquares]].
+ * It can be used to find maximum likelihood estimates of a generalized 
linear model (GLM),
+ * find M-estimator in robust regression and other optimization problems.
+ *
+ * @param initialModel the initial guess model.
+ * @param reweightFunc the reweight function which is used to update 
offsets and weights
+ * at each iteration.
+ * @param fitIntercept whether to fit intercept.
+ * @param regParam L2 regularization parameter used by WLS.
+ * @param maxIter maximum number of iterations.
+ * @param tol the convergence tolerance.
+ */
+private[ml] class IterativelyReweightedLeastSquares(
+val initialModel: WeightedLeastSquaresModel,
+val reweightFunc: (Instance, WeightedLeastSquaresModel) => (Double, 
Double),
+val fitIntercept: Boolean,
+val regParam: Double,
+val maxIter: Int,
+val tol: Double) extends Logging with Serializable {
+
+  def fit(instances: RDD[Instance]): 
IterativelyReweightedLeastSquaresModel = {
+
+var converged = false
+var iter = 0
+
+var offsetsAndWeights: RDD[(Double, Double)] = null
+var model: WeightedLeastSquaresModel = initialModel
+var oldModel: WeightedLeastSquaresModel = initialModel
--- End diff --

`= null`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12968][SQL] Implement command to set cu...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10916#issuecomment-174866819
  
**[Test build #50079 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50079/consoleFull)**
 for PR 10916 at commit 
[`43beb4b`](https://github.com/apache/spark/commit/43beb4ba499814c698df7537018ab6fafefa738e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   >