[GitHub] spark issue #16699: [SPARK-18710][ML] Add offset in GLM

2017-01-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16699
  
**[Test build #72067 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72067/testReport)**
 for PR 16699 at commit 
[`58f93af`](https://github.com/apache/spark/commit/58f93af6236d52f87c82411a645cf15413b30b9e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

2017-01-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16700
  
**[Test build #72066 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72066/testReport)**
 for PR 16700 at commit 
[`de4c409`](https://github.com/apache/spark/commit/de4c409a9fbbe1f7fbed2d750f9d287e27470d3e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16344


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-26 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/16344
  
Merged into master. If there are comments from others, we can address them 
in follow-up work. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16706: [SPARK-19365][Core]Optimize RequestMessage serialization

2017-01-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16706
  
**[Test build #72065 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72065/testReport)**
 for PR 16706 at commit 
[`05e9a0c`](https://github.com/apache/spark/commit/05e9a0c517d1357e4ffc23bed6bdecb5325552b9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16716: [SPARK-19378][SS] Ensure continuity of stateOperator and...

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16716
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16716: [SPARK-19378][SS] Ensure continuity of stateOperator and...

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16716
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72064/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16716: [SPARK-19378][SS] Ensure continuity of stateOperator and...

2017-01-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16716
  
**[Test build #72064 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72064/testReport)**
 for PR 16716 at commit 
[`e23de4a`](https://github.com/apache/spark/commit/e23de4ad5adae6a79a0b081344ba2bf575ec6d8c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16696: [SPARK-19350] [SQL] Cardinality estimation of Lim...

2017-01-26 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16696#discussion_r98146358
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -727,37 +728,18 @@ case class GlobalLimit(limitExpr: Expression, child: 
LogicalPlan) extends UnaryN
   }
   override def computeStats(conf: CatalystConf): Statistics = {
 val limit = limitExpr.eval().asInstanceOf[Int]
-val sizeInBytes = if (limit == 0) {
-  // sizeInBytes can't be zero, or sizeInBytes of BinaryNode will also 
be zero
-  // (product of children).
-  1
-} else {
-  (limit: Long) * output.map(a => a.dataType.defaultSize).sum
-}
-child.stats(conf).copy(sizeInBytes = sizeInBytes)
+val childStats = child.stats(conf)
+// Don't propagate column stats, because we don't know the 
distribution after a limit operation
+Statistics(
+  sizeInBytes = EstimationUtils.getOutputSize(output, limit, 
childStats.attributeStats),
--- End diff --

We should. Otherwise the `rowCount` is not correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16668: [SPARK-18788][SPARKR] Add API for getNumPartition...

2017-01-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16668


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper ...

2017-01-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16566


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper in Spar...

2017-01-26 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16566
  
merged to master. Let's follow up with programming guide, example and 
vignettes - would you be able to pick these up too @wangmiao1981 ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16689: [SPARK-19342][SPARKR] bug fixed in collect method for co...

2017-01-26 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16689
  
Just to make sure you see this: 
https://github.com/apache/spark/pull/16689#issuecomment-275063425



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15730: [SPARK-18218][ML][MLLib] Reduce shuffled data siz...

2017-01-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15730


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15730: [SPARK-18218][ML][MLLib] Reduce shuffled data size of Bl...

2017-01-26 Thread brkyvz
Github user brkyvz commented on the issue:

https://github.com/apache/spark/pull/15730
  
Merging to master! Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16716: [SPARK-19378][SS] Ensure continuity of stateOperator and...

2017-01-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16716
  
**[Test build #72064 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72064/testReport)**
 for PR 16716 at commit 
[`e23de4a`](https://github.com/apache/spark/commit/e23de4ad5adae6a79a0b081344ba2bf575ec6d8c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16716: [SPARK-19378][SS] Ensure continuity of stateOpera...

2017-01-26 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/16716#discussion_r98141759
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryStatusAndProgressSuite.scala
 ---
@@ -171,6 +174,47 @@ class StreamingQueryStatusAndProgressSuite extends 
StreamTest {
   query.stop()
 }
   }
+
+  test("SPARK-19378: Continue reporting stateOp and eventTime metrics even 
if there is no data") {
+import testImplicits._
+
+withSQLConf(SQLConf.STREAMING_NO_DATA_PROGRESS_EVENT_INTERVAL.key -> 
"10") {
+  val inputData = MemoryStream[(Int, String)]
+
+  val query = inputData.toDS().toDF("value", "time")
+.select('value, 'time.cast("timestamp"))
+.withWatermark("time", "10 seconds")
+.groupBy($"value")
+.agg(count("*"))
+.writeStream
+.queryName("metric_continuity")
+.format("memory")
+.outputMode("complete")
+.start()
+  try {
+inputData.addData((1, "2017-01-26 01:00:00"), (2, "2017-01-26 
01:00:02"))
+query.processAllAvailable()
+
+val progress = query.lastProgress
+assert(progress.eventTime.size() > 1)
+assert(progress.stateOperators.length > 0)
+// Should emit new progresses every 10 ms, but we could be facing 
a slow Jenkins
+eventually(timeout(1 minute)) {
+  val nextProgress = query.lastProgress
+  assert(nextProgress.timestamp !== progress.timestamp)
+  assert(nextProgress.numInputRows === 0)
+  assert(nextProgress.eventTime.get("min") === "2017-01-26 
01:00:00")
--- End diff --

Oh shoot. I should definitely leave those out because they are trigger 
specific right?
I should only keep the stateOperator part


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

2017-01-26 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/16694#discussion_r98141073
  
--- Diff: python/pyspark/ml/classification.py ---
@@ -60,6 +61,137 @@ def numClasses(self):
 
 
 @inherit_doc
+class LinearSVC(JavaEstimator, HasFeaturesCol, HasLabelCol, 
HasPredictionCol, HasMaxIter,
+HasRegParam, HasTol, HasRawPredictionCol, HasFitIntercept, 
HasStandardization,
+HasThreshold, HasWeightCol, HasAggregationDepth, 
JavaMLWritable, JavaMLReadable):
+"""
+Linear SVM Classifier 
(https://en.wikipedia.org/wiki/Support_vector_machine#Linear_SVM)
+This binary classifier optimizes the Hinge Loss using the OWLQN 
optimizer.
+
+>>> from pyspark.sql import Row
+>>> from pyspark.ml.linalg import Vectors
+>>> bdf = sc.parallelize([
--- End diff --

Rename bdf -> df


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

2017-01-26 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/16694#discussion_r98141077
  
--- Diff: python/pyspark/ml/classification.py ---
@@ -60,6 +61,137 @@ def numClasses(self):
 
 
 @inherit_doc
+class LinearSVC(JavaEstimator, HasFeaturesCol, HasLabelCol, 
HasPredictionCol, HasMaxIter,
+HasRegParam, HasTol, HasRawPredictionCol, HasFitIntercept, 
HasStandardization,
+HasThreshold, HasWeightCol, HasAggregationDepth, 
JavaMLWritable, JavaMLReadable):
+"""
+Linear SVM Classifier 
(https://en.wikipedia.org/wiki/Support_vector_machine#Linear_SVM)
+This binary classifier optimizes the Hinge Loss using the OWLQN 
optimizer.
+
+>>> from pyspark.sql import Row
+>>> from pyspark.ml.linalg import Vectors
+>>> bdf = sc.parallelize([
+... Row(label=1.0, weight=2.0, features=Vectors.dense(1.0)),
+... Row(label=0.0, weight=2.0, features=Vectors.sparse(1, [], 
[]))]).toDF()
+>>> svm = LinearSVC(maxIter=5, regParam=0.01, weightCol="weight")
+>>> model = svm.fit(bdf)
+>>> model.coefficients
+DenseVector([1.909])
+>>> model.intercept
+-1.0045358384178
+>>> model.numClasses
+2
+>>> model.numFeatures
+1
+>>> test0 = sc.parallelize([Row(features=Vectors.dense(-1.0))]).toDF()
+>>> result = model.transform(test0).head()
+>>> result.prediction
+0.0
+>>> result.rawPrediction
+DenseVector([2.9135, -2.9135])
+>>> test1 = sc.parallelize([Row(features=Vectors.sparse(1, [0], 
[1.0]))]).toDF()
--- End diff --

No need to test sparse vectors here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

2017-01-26 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/16694#discussion_r98141071
  
--- Diff: python/pyspark/ml/classification.py ---
@@ -60,6 +61,137 @@ def numClasses(self):
 
 
 @inherit_doc
+class LinearSVC(JavaEstimator, HasFeaturesCol, HasLabelCol, 
HasPredictionCol, HasMaxIter,
+HasRegParam, HasTol, HasRawPredictionCol, HasFitIntercept, 
HasStandardization,
+HasThreshold, HasWeightCol, HasAggregationDepth, 
JavaMLWritable, JavaMLReadable):
+"""
+Linear SVM Classifier 
(https://en.wikipedia.org/wiki/Support_vector_machine#Linear_SVM)
--- End diff --

Have you tried generating the docs?  Check out other examples to see how to 
do links.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

2017-01-26 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/16694#discussion_r98141079
  
--- Diff: python/pyspark/ml/classification.py ---
@@ -60,6 +61,137 @@ def numClasses(self):
 
 
 @inherit_doc
+class LinearSVC(JavaEstimator, HasFeaturesCol, HasLabelCol, 
HasPredictionCol, HasMaxIter,
+HasRegParam, HasTol, HasRawPredictionCol, HasFitIntercept, 
HasStandardization,
+HasThreshold, HasWeightCol, HasAggregationDepth, 
JavaMLWritable, JavaMLReadable):
+"""
+Linear SVM Classifier 
(https://en.wikipedia.org/wiki/Support_vector_machine#Linear_SVM)
+This binary classifier optimizes the Hinge Loss using the OWLQN 
optimizer.
+
+>>> from pyspark.sql import Row
+>>> from pyspark.ml.linalg import Vectors
+>>> bdf = sc.parallelize([
+... Row(label=1.0, weight=2.0, features=Vectors.dense(1.0)),
+... Row(label=0.0, weight=2.0, features=Vectors.sparse(1, [], 
[]))]).toDF()
+>>> svm = LinearSVC(maxIter=5, regParam=0.01, weightCol="weight")
+>>> model = svm.fit(bdf)
+>>> model.coefficients
+DenseVector([1.909])
+>>> model.intercept
+-1.0045358384178
+>>> model.numClasses
+2
+>>> model.numFeatures
+1
+>>> test0 = sc.parallelize([Row(features=Vectors.dense(-1.0))]).toDF()
+>>> result = model.transform(test0).head()
+>>> result.prediction
+0.0
+>>> result.rawPrediction
+DenseVector([2.9135, -2.9135])
+>>> test1 = sc.parallelize([Row(features=Vectors.sparse(1, [0], 
[1.0]))]).toDF()
+>>> model.transform(test1).head().prediction
+1.0
+>>> svm.setParams("vector")
--- End diff --

Put this in a unit test (tests.py), not here in the doc tests (though I 
also don't think you really need this test)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

2017-01-26 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/16694#discussion_r98141074
  
--- Diff: python/pyspark/ml/classification.py ---
@@ -60,6 +61,137 @@ def numClasses(self):
 
 
 @inherit_doc
+class LinearSVC(JavaEstimator, HasFeaturesCol, HasLabelCol, 
HasPredictionCol, HasMaxIter,
+HasRegParam, HasTol, HasRawPredictionCol, HasFitIntercept, 
HasStandardization,
+HasThreshold, HasWeightCol, HasAggregationDepth, 
JavaMLWritable, JavaMLReadable):
+"""
+Linear SVM Classifier 
(https://en.wikipedia.org/wiki/Support_vector_machine#Linear_SVM)
+This binary classifier optimizes the Hinge Loss using the OWLQN 
optimizer.
+
+>>> from pyspark.sql import Row
+>>> from pyspark.ml.linalg import Vectors
+>>> bdf = sc.parallelize([
+... Row(label=1.0, weight=2.0, features=Vectors.dense(1.0)),
--- End diff --

I'd simplify this example since it is going to be part of the documentation:
* Remove "weight"
* Just use dense vectors to make the doc clearer.  Sparse vectors are 
tested elsewhere for Python and should be tested in Scala for LinearSVC (for 
which I'll make a JIRA).
* Make the feature vectors be length 2 or 3


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

2017-01-26 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/16694#discussion_r98141066
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
@@ -63,7 +63,7 @@ class LinearSVC @Since("2.2.0") (
   def this() = this(Identifiable.randomUID("linearsvc"))
 
   /**
-   * Set the regularization parameter.
+   * Sets the regularization parameter.
--- End diff --

There's no need to change this.  Most other algorithms use "set" not "sets"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16690: [SPARK-19347] ReceiverSupervisorImpl can add block to Re...

2017-01-26 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/16690
  
@vanzin @zsxwing 
ping for review~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16716: [SPARK-19378][SS] Ensure continuity of stateOpera...

2017-01-26 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16716#discussion_r98140040
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryStatusAndProgressSuite.scala
 ---
@@ -171,6 +174,47 @@ class StreamingQueryStatusAndProgressSuite extends 
StreamTest {
   query.stop()
 }
   }
+
+  test("SPARK-19378: Continue reporting stateOp and eventTime metrics even 
if there is no data") {
+import testImplicits._
+
+withSQLConf(SQLConf.STREAMING_NO_DATA_PROGRESS_EVENT_INTERVAL.key -> 
"10") {
+  val inputData = MemoryStream[(Int, String)]
+
+  val query = inputData.toDS().toDF("value", "time")
+.select('value, 'time.cast("timestamp"))
+.withWatermark("time", "10 seconds")
+.groupBy($"value")
+.agg(count("*"))
+.writeStream
+.queryName("metric_continuity")
+.format("memory")
+.outputMode("complete")
+.start()
+  try {
+inputData.addData((1, "2017-01-26 01:00:00"), (2, "2017-01-26 
01:00:02"))
+query.processAllAvailable()
+
+val progress = query.lastProgress
+assert(progress.eventTime.size() > 1)
+assert(progress.stateOperators.length > 0)
+// Should emit new progresses every 10 ms, but we could be facing 
a slow Jenkins
+eventually(timeout(1 minute)) {
+  val nextProgress = query.lastProgress
+  assert(nextProgress.timestamp !== progress.timestamp)
+  assert(nextProgress.numInputRows === 0)
+  assert(nextProgress.eventTime.get("min") === "2017-01-26 
01:00:00")
--- End diff --

This does not make sense. if there is no data in the last trigger, the min, 
max, avg timestamps cannot be different. 
and what about the watermark?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16717: [SPARK-19220][UI] Make redirection to HTTPS apply to all...

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16717
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72060/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16717: [SPARK-19220][UI] Make redirection to HTTPS apply to all...

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16717
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16717: [SPARK-19220][UI] Make redirection to HTTPS apply to all...

2017-01-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16717
  
**[Test build #72060 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72060/consoleFull)**
 for PR 16717 at commit 
[`ddd7727`](https://github.com/apache/spark/commit/ddd7727fc89b2db3721b712204243718cbbcfe92).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16716: [SPARK-19378][SS] Ensure continuity of stateOperator and...

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16716
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72059/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16716: [SPARK-19378][SS] Ensure continuity of stateOperator and...

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16716
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16700
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72062/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16700
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16716: [SPARK-19378][SS] Ensure continuity of stateOperator and...

2017-01-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16716
  
**[Test build #72059 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72059/testReport)**
 for PR 16716 at commit 
[`884a789`](https://github.com/apache/spark/commit/884a7893f5ae0c388ee79b04f92f62757b8aaea0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15730: [SPARK-18218][ML][MLLib] Reduce shuffled data size of Bl...

2017-01-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15730
  
**[Test build #72063 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72063/testReport)**
 for PR 15730 at commit 
[`4edcd74`](https://github.com/apache/spark/commit/4edcd74ba713c257b7117edf83d49aebc09df67a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15730: [SPARK-18218][ML][MLLib] Reduce shuffled data size of Bl...

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15730
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15730: [SPARK-18218][ML][MLLib] Reduce shuffled data size of Bl...

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15730
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72063/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

2017-01-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16700
  
**[Test build #72062 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72062/testReport)**
 for PR 16700 at commit 
[`0136388`](https://github.com/apache/spark/commit/01363887d4b755c14da1e14915fab124f6550e6d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16716: [SPARK-19378][SS] Ensure continuity of stateOperator and...

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16716
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72061/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16716: [SPARK-19378][SS] Ensure continuity of stateOperator and...

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16716
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16716: [SPARK-19378][SS] Ensure continuity of stateOperator and...

2017-01-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16716
  
**[Test build #72061 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72061/testReport)**
 for PR 16716 at commit 
[`55e3d36`](https://github.com/apache/spark/commit/55e3d3643c856ddf4f43a12c0b74953d4baec80d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16714: [SPARK-16333][Core] Enable EventLoggingListener to log l...

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16714
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16714: [SPARK-16333][Core] Enable EventLoggingListener to log l...

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16714
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72058/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16714: [SPARK-16333][Core] Enable EventLoggingListener to log l...

2017-01-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16714
  
**[Test build #72058 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72058/testReport)**
 for PR 16714 at commit 
[`f146121`](https://github.com/apache/spark/commit/f1461215c8210a19022d384a1ca95566da1406a8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16690: [SPARK-19347] ReceiverSupervisorImpl can add block to Re...

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16690
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16690: [SPARK-19347] ReceiverSupervisorImpl can add block to Re...

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16690
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72057/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16690: [SPARK-19347] ReceiverSupervisorImpl can add block to Re...

2017-01-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16690
  
**[Test build #72057 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72057/testReport)**
 for PR 16690 at commit 
[`3b7e17b`](https://github.com/apache/spark/commit/3b7e17ba3a2db650b83b0f8d161754bfe53ca31a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13808: [SPARK-14480][SQL] Remove meaningless StringIteratorRead...

2017-01-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/13808
  
FWIW, I remember I had a hard time to figure out 
https://issues.apache.org/jira/browse/SPARK-14103 where the issue itself was 
about quote but it ended up reading whole partition as a value.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13808: [SPARK-14480][SQL] Remove meaningless StringIteratorRead...

2017-01-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/13808
  
@davies, removed `StringIteratorReader` concatenates the lines in each 
iterator into reader in each partition IIRC.

New line in the column was not supported correctly up to my understanding 
because rows can spawn across multiple blocks. This is a similar problem that 
we have not supported multiple JSON lines before up to my knowledge. 

Currently,  we have some open PRs for dealing with multiple lines support 
by using something like `wholeTextFile` or dealing with each file as a multiple 
line json, which I think we could solve in that way if any of it is merged.

I guess we introduced several regression or behaviour changes when we 
porting. Would this be acceptable rather than supporting multiple lines one in 
this way? 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16415: [SPARK-19063][ML]Speedup and optimize the GradientBooste...

2017-01-26 Thread zdh2292390
Github user zdh2292390 commented on the issue:

https://github.com/apache/spark/pull/16415
  
@jkbradley @srowen Have you checked the latest commit?Is there any problem? 
Feeling long time no news


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15730: [SPARK-18218][ML][MLLib] Reduce shuffled data size of Bl...

2017-01-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15730
  
**[Test build #72063 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72063/testReport)**
 for PR 15730 at commit 
[`4edcd74`](https://github.com/apache/spark/commit/4edcd74ba713c257b7117edf83d49aebc09df67a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15730: [SPARK-18218][ML][MLLib] Reduce shuffled data size of Bl...

2017-01-26 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/15730
  
@brkyvz Also thanks for your careful code review! ^_^ 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...

2017-01-26 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16700
  
LGTM except three comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

2017-01-26 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16700#discussion_r98134323
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -839,6 +839,25 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
 spec.map { case (k, v) => partCols.find(_.equalsIgnoreCase(k)).get -> 
v }
   }
 
+
+  /**
+   * The partition path created by Hive is in lowercase, while Spark SQL 
will
+   * rename it with the partition name in partitionColumnNames, and this 
function
+   * returns the extra lowercase path created by Hive, and then we can 
delete it.
+   * e.g. /path/A=1/B=2/C=3 is changed to /path/A=4/B=5/C=6, this function 
returns is
--- End diff --

`returns is` -> `returns`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

2017-01-26 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16700#discussion_r98134168
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -899,6 +918,22 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   spec, partitionColumnNames, tablePath)
 try {
   tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
+
+  // If the newSpec contains more than one depth partition, 
FileSystem.rename just deletes
+  // the leaf(i.e. wrongPath), we should check if wrongPath's 
parents need to be deleted.
+  // for example:
+  // newSpec is 'A=1/B=2', after renamePartitions by Hive, the 
location path in FileSystem
+  // is changed to 'a=1/b=2', which is wrongPath, then we renamed 
to 'A=1/B=2', and
+  // 'a=1/b=2' in FileSystem is deleted, while 'a=1' is already 
exists,
+  // which should also be deleted
--- End diff --

How about?
> For example, give a newSpec 'A=1/B=2', after calling Hive's 
client.renamePartitions, the location path in FileSystem is changed to 
'a=1/b=2', which is wrongPath. Then, although we renamed it to 'A=1/B=2', 
'a=1/b=2' in FileSystem is deleted but 'a=1' still exists. We also need to 
delete the useless directory.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14725: [SPARK-17161] [PYSPARK][ML] Add PySpark-ML JavaWrapper c...

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14725
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72056/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14725: [SPARK-17161] [PYSPARK][ML] Add PySpark-ML JavaWrapper c...

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14725
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14725: [SPARK-17161] [PYSPARK][ML] Add PySpark-ML JavaWrapper c...

2017-01-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14725
  
**[Test build #72056 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72056/testReport)**
 for PR 14725 at commit 
[`869981f`](https://github.com/apache/spark/commit/869981f895a84309bb333c570d56d25d17fb4a7a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

2017-01-26 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16700#discussion_r98133500
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -899,6 +918,22 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   spec, partitionColumnNames, tablePath)
 try {
   tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
+
+  // If the newSpec contains more than one depth partition, 
FileSystem.rename just deletes
+  // the leaf(i.e. wrongPath), we should check if wrongPath's 
parents need to be deleted.
+  // for example:
+  // newSpec is 'A=1/B=2', after renamePartitions by Hive, the 
location path in FileSystem
+  // is changed to 'a=1/b=2', which is wrongPath, then we renamed 
to 'A=1/B=2', and
+  // 'a=1/b=2' in FileSystem is deleted, while 'a=1' is already 
exists,
+  // which should also be deleted
+  val delHivePartPathAfterRename = getExtraPartPathCreatedByHive(
+lowerCasePartitionSpec(spec),
--- End diff --

The last comment: I prefer to calling `lowerCasePartitionSpec ` in the func 
`getExtraPartPathCreatedByHive`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13808: [SPARK-14480][SQL] Remove meaningless StringIteratorRead...

2017-01-26 Thread davies
Github user davies commented on the issue:

https://github.com/apache/spark/pull/13808
  
@HyukjinKwon @rxin This patch have a regression: A column that have escaped 
newline can't be correctly parsed anymore. Should we revert this patch or 
figure a way to fix that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16650
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72053/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16650
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...

2017-01-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16650
  
**[Test build #72053 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72053/testReport)**
 for PR 16650 at commit 
[`580999e`](https://github.com/apache/spark/commit/580999e51692824af778fc3e191d735b2badb724).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16625: [SPARK-17874][core] Add SSL port configuration.

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16625
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72054/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16625: [SPARK-17874][core] Add SSL port configuration.

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16625
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16625: [SPARK-17874][core] Add SSL port configuration.

2017-01-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16625
  
**[Test build #72054 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72054/testReport)**
 for PR 16625 at commit 
[`f6825d1`](https://github.com/apache/spark/commit/f6825d1ee93926fc3eb14b028a5d883caf0784ae).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

2017-01-26 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16700#discussion_r98131097
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -899,6 +918,21 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   spec, partitionColumnNames, tablePath)
 try {
   tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
+
+  // if the newSpec contains more than one depth partitoin, 
FileSystem.rename just delete
+  // only one path(wrongPath), we should check if wrongPath's 
parents need to be deleted.
--- End diff --

Thanks a lot! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16716: [SPARK-19378][SS] Ensure continuity of stateOperator and...

2017-01-26 Thread brkyvz
Github user brkyvz commented on the issue:

https://github.com/apache/spark/pull/16716
  
@tdas Addressed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...

2017-01-26 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16700#discussion_r98130978
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -839,6 +839,25 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
 spec.map { case (k, v) => partCols.find(_.equalsIgnoreCase(k)).get -> 
v }
   }
 
+
+  /**
+   * partition path created by Hive is lower-case, while Spark SQL will
+   * rename it with the partition name in partitionColumnNames, and this 
function
+   * return the extra lower-case path created by Hive, and then we can 
delete it.
+   * e.g. /path/A=1/B=2/C=3 rename to /path/A=4/B=5/C=6, the extra path 
returned is
+   * /path/a=4, which also include all its' child path, such as 
/path/a=4/b=2
+   */
+  def getExtraPartPathCreatedByHive(
+ lowerCaseSpec: TablePartitionSpec,
+ partitionColumnNames: Seq[String],
+ tablePath: Path): Path = {
--- End diff --

oh...sorry...thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16716: [SPARK-19378][SS] Ensure continuity of stateOperator and...

2017-01-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16716
  
**[Test build #72061 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72061/testReport)**
 for PR 16716 at commit 
[`55e3d36`](https://github.com/apache/spark/commit/55e3d3643c856ddf4f43a12c0b74953d4baec80d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14725: [SPARK-17161] [PYSPARK][ML] Add PySpark-ML JavaWrapper c...

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14725
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72051/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14725: [SPARK-17161] [PYSPARK][ML] Add PySpark-ML JavaWrapper c...

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14725
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14725: [SPARK-17161] [PYSPARK][ML] Add PySpark-ML JavaWrapper c...

2017-01-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14725
  
**[Test build #72051 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72051/testReport)**
 for PR 14725 at commit 
[`65dcfb6`](https://github.com/apache/spark/commit/65dcfb61580303b2829ebebff2c997424fa2f4d2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16717: [SPARK-19220][UI] Make redirection to HTTPS apply to all...

2017-01-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16717
  
**[Test build #72060 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72060/consoleFull)**
 for PR 16717 at commit 
[`ddd7727`](https://github.com/apache/spark/commit/ddd7727fc89b2db3721b712204243718cbbcfe92).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16717: [SPARK-19220][UI] Make redirection to HTTPS apply...

2017-01-26 Thread vanzin
GitHub user vanzin opened a pull request:

https://github.com/apache/spark/pull/16717

[SPARK-19220][UI] Make redirection to HTTPS apply to all URIs. (branch-2.0)

The redirect handler was installed only for the root of the server;
any other context ended up being served directly through the HTTP
port. Since every sub page (e.g. application UIs in the history
server) is a separate servlet context, this meant that everything
but the root was accessible via HTTP still.

The change adds separate names to each connector, and binds contexts
to specific connectors so that content is only served through the
HTTPS connector when it's enabled. In that case, the only thing that
binds to the HTTP connector is the redirect handler.

Tested with new unit tests and by checking a live history server.

(cherry picked from commit 59502bbcf6e64e5b5e3dda080441054afaf58c53)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vanzin/spark SPARK-19220_2.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16717.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16717


commit ddd7727fc89b2db3721b712204243718cbbcfe92
Author: Marcelo Vanzin 
Date:   2017-01-27T00:53:28Z

[SPARK-19220][UI] Make redirection to HTTPS apply to all URIs. (branch-2.0)

The redirect handler was installed only for the root of the server;
any other context ended up being served directly through the HTTP
port. Since every sub page (e.g. application UIs in the history
server) is a separate servlet context, this meant that everything
but the root was accessible via HTTP still.

The change adds separate names to each connector, and binds contexts
to specific connectors so that content is only served through the
HTTPS connector when it's enabled. In that case, the only thing that
binds to the HTTP connector is the redirect handler.

Tested with new unit tests and by checking a live history server.

(cherry picked from commit 59502bbcf6e64e5b5e3dda080441054afaf58c53)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-01-26 Thread zero323
Github user zero323 commented on the issue:

https://github.com/apache/spark/pull/16534
  
@rxin I am not aware of any straightforward way of separating these two, 
but I focused on the docstrings anyway. The rationale is simple -  I want to be 
able to:

- Create packages containing UDFs.
- [Get concise syntax with 
decorators](https://github.com/apache/spark/pull/16533) without need for 
intermediate functions, or nesting.
- [Import UDFs without side 
effects](https://github.com/apache/spark/pull/16536).
- Have docstrings and argument annotations which correspond to the function 
I wrap, not a generic `UserDefinedFunctionObject` -  this is what I want to 
achieve here.  As illustrated in the JIRA ticket what we get right now is 
completely useless:

  ```
  In [5]: ?add_one
  Type:UserDefinedFunction
  String form: 
  File:~/Spark/spark-2.0/python/pyspark/sql/functions.py
  Signature:   add_one(*cols)
  Docstring:
  User defined function in Python


  .. versionadded:: 1.3
  ```

  ```
   help(add_one)
  
  Help on UserDefinedFunction in module pyspark.sql.functions object:
  
  class UserDefinedFunction(builtins.object)
   |  User defined function in Python
   |  
   |  .. versionadded:: 1.3
   |  
   |  Methods defined here:
   |  
   |  __call__(self, *cols)
   |  Call self as a function.
   |  
   |  __del__(self)
   |  
   |  __init__(self, func, returnType, name=None)
   |  Initialize self.  See help(type(self)) for accurate signature.
   |  
   |  --
   |  Data descriptors defined here:
   |  
   |  __dict__
   |  dictionary for instance variables (if defined)
   |  
   |  __weakref__
   |  list of weak references to the object (if defined)
  (END)
   ```

  REPL is definitely the main use case. Handling docs with `wraps` is much 
trickier, but there are known workarounds .




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16716: [SPARK-19378][SS] Ensure continuity of stateOpera...

2017-01-26 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16716#discussion_r98127618
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala
 ---
@@ -180,15 +180,59 @@ trait ProgressReporter extends Logging {
 currentStatus = currentStatus.copy(isTriggerActive = false)
   }
 
+  /**
+   * Extract statistics about stateful operators from the executed query 
plan.
+   * SPARK-19378: Still report stateOperator metrics even though no data 
was processed while
+   * reporting progress.
+   */
+  private def extractStateOperatorMetrics(hasNewData: Boolean): 
Seq[StateOperatorProgress] = {
+if (lastExecution == null) return Nil
+// lastExecution could belong to one of the previous triggers if 
`!hasNewData`.
+// Walking the plan again should be inexpensive.
+val stateNodes = lastExecution.executedPlan.collect {
+  case p if p.isInstanceOf[StateStoreSaveExec] => p
+}
+stateNodes.map { node =>
+  val numRowsUpdated = if (hasNewData) {
+node.metrics.get("numUpdatedStateRows").map(_.value).getOrElse(0L)
+  } else {
+0L
+  }
+  new StateOperatorProgress(
+numRowsTotal = 
node.metrics.get("numTotalStateRows").map(_.value).getOrElse(0L),
+numRowsUpdated = numRowsUpdated)
+}
+  }
+
+  /**
+   * Extract statistics about event time from the executed query plan.
+   * SPARK-19378: Still report eventTime metrics even though no data was 
processed while
+   * reporting progress.
+   */
+  private def extractEventTimeStats(watermarkTs: Map[String, String]): 
Map[String, String] = {
--- End diff --

it does not make sense for this method to take this watermarkTs as a param. 
its not extracting event time states from watermark ts, its just appending it. 
Then why not just return empty map, and do the appending outside? Or do the 
extraction of watermark inside the function as well. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16716: [SPARK-19378][SS] Ensure continuity of stateOpera...

2017-01-26 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16716#discussion_r98127344
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryStatusAndProgressSuite.scala
 ---
@@ -171,6 +174,42 @@ class StreamingQueryStatusAndProgressSuite extends 
StreamTest {
   query.stop()
 }
   }
+
+  test("SPARK-19378: Continue reporting stateOp and eventTime metrics even 
if there is no data") {
+import testImplicits._
+
+withSQLConf(SQLConf.STREAMING_NO_DATA_PROGRESS_EVENT_INTERVAL.key -> 
"10") {
+  val inputData = MemoryStream[(Int, String)]
+
+  val query = inputData.toDS().toDF("value", "time")
+.select('value, 'time.cast("timestamp"))
+.withWatermark("time", "10 seconds")
+.groupBy($"value")
+.agg(count("*"))
+.writeStream
+.queryName("metric_continuity")
+.format("memory")
+.outputMode("complete")
+.start()
+  try {
+inputData.addData((1, "2017-01-26 01:00:00"), (2, "2017-01-26 
01:00:02"))
+query.processAllAvailable()
+
+val progress = query.lastProgress
+assert(progress.eventTime.size() > 1)
+assert(progress.stateOperators.length > 0)
+// Should emit new progresses every 10 ms, but we could be facing 
a slow Jenkins
+eventually(timeout(1 minute)) {
+  val nextProgress = query.lastProgress
+  assert(nextProgress.timestamp !== progress.timestamp)
+  assert(progress.eventTime.size() > 1)
+  assert(progress.stateOperators.length > 0)
--- End diff --

you are not verifying that that the metric values are as expected. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16716: [SPARK-19378][SS] Ensure continuity of stateOpera...

2017-01-26 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16716#discussion_r98127274
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala
 ---
@@ -180,15 +180,59 @@ trait ProgressReporter extends Logging {
 currentStatus = currentStatus.copy(isTriggerActive = false)
   }
 
+  /**
+   * Extract statistics about stateful operators from the executed query 
plan.
+   * SPARK-19378: Still report stateOperator metrics even though no data 
was processed while
+   * reporting progress.
+   */
+  private def extractStateOperatorMetrics(hasNewData: Boolean): 
Seq[StateOperatorProgress] = {
+if (lastExecution == null) return Nil
+// lastExecution could belong to one of the previous triggers if 
`!hasNewData`.
+// Walking the plan again should be inexpensive.
+val stateNodes = lastExecution.executedPlan.collect {
+  case p if p.isInstanceOf[StateStoreSaveExec] => p
+}
+stateNodes.map { node =>
+  val numRowsUpdated = if (hasNewData) {
+node.metrics.get("numUpdatedStateRows").map(_.value).getOrElse(0L)
+  } else {
+0L
+  }
+  new StateOperatorProgress(
+numRowsTotal = 
node.metrics.get("numTotalStateRows").map(_.value).getOrElse(0L),
+numRowsUpdated = numRowsUpdated)
+}
+  }
+
+  /**
+   * Extract statistics about event time from the executed query plan.
+   * SPARK-19378: Still report eventTime metrics even though no data was 
processed while
--- End diff --

Same here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16716: [SPARK-19378][SS] Ensure continuity of stateOpera...

2017-01-26 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16716#discussion_r98127228
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala
 ---
@@ -180,15 +180,59 @@ trait ProgressReporter extends Logging {
 currentStatus = currentStatus.copy(isTriggerActive = false)
   }
 
+  /**
+   * Extract statistics about stateful operators from the executed query 
plan.
+   * SPARK-19378: Still report stateOperator metrics even though no data 
was processed while
--- End diff --

Does not make sense to have jira numbers in a methods scala docs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16716: [SPARK-19378][SS] Ensure continuity of stateOpera...

2017-01-26 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16716#discussion_r98127156
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryStatusAndProgressSuite.scala
 ---
@@ -171,6 +174,42 @@ class StreamingQueryStatusAndProgressSuite extends 
StreamTest {
   query.stop()
 }
   }
+
+  test("SPARK-19378: Continue reporting stateOp and eventTime metrics even 
if there is no data") {
+import testImplicits._
+
+withSQLConf(SQLConf.STREAMING_NO_DATA_PROGRESS_EVENT_INTERVAL.key -> 
"10") {
+  val inputData = MemoryStream[(Int, String)]
+
+  val query = inputData.toDS().toDF("value", "time")
+.select('value, 'time.cast("timestamp"))
+.withWatermark("time", "10 seconds")
+.groupBy($"value")
+.agg(count("*"))
+.writeStream
+.queryName("metric_continuity")
+.format("memory")
+.outputMode("complete")
+.start()
+  try {
+inputData.addData((1, "2017-01-26 01:00:00"), (2, "2017-01-26 
01:00:02"))
+query.processAllAvailable()
+
+val progress = query.lastProgress
+assert(progress.eventTime.size() > 1)
+assert(progress.stateOperators.length > 0)
+// Should emit new progresses every 10 ms, but we could be facing 
a slow Jenkins
+eventually(timeout(1 minute)) {
+  val nextProgress = query.lastProgress
+  assert(nextProgress.timestamp !== progress.timestamp)
--- End diff --

can you explicitly verify that this progress has no data?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16697: [SPARK-19358][CORE] LiveListenerBus shall log the event ...

2017-01-26 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/16697
  
This PR will log only the first event which got dropped.
The first event which got droped need not necessarily have a correlation to 
the actual event(s) flooding the queue.

The PR description is slightly incorrect - it logs the entire event, and 
not just the name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16711: [SPARK-19220][UI] Make redirection to HTTPS apply to all...

2017-01-26 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/16711
  
Didn't merge to 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16711: [SPARK-19220][UI] Make redirection to HTTPS apply...

2017-01-26 Thread vanzin
Github user vanzin closed the pull request at:

https://github.com/apache/spark/pull/16711


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16711: [SPARK-19220][UI] Make redirection to HTTPS apply to all...

2017-01-26 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/16711
  
Merging to 2.1 (and 2.0 if no conflicts).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16650: [SPARK-16554][CORE] Automatically Kill Executors ...

2017-01-26 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16650#discussion_r98126533
  
--- Diff: 
core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala ---
@@ -54,11 +54,28 @@ private[spark] trait ExecutorAllocationClient {
 
   /**
* Request that the cluster manager kill the specified executors.
+   *
+   * When asking the executor to be replaced, the executor loss is 
considered a failure, and
+   * killed tasks that are running on the executor will count towards the 
failure limits. If no
+   * replacement is being requested, then the tasks will not count towards 
the limit.
+   *
+   * @param executorIds identifiers of executors to kill
+   * @param replace whether to replace the killed executors with new ones, 
default false
+   * @param force whether to force kill busy executors, default false
* @return the ids of the executors acknowledged by the cluster manager 
to be removed.
*/
-  def killExecutors(executorIds: Seq[String]): Seq[String]
+  def killExecutors(
+executorIds: Seq[String],
+replace: Boolean = false,
+force: Boolean = false): Seq[String]
 
   /**
+   * Request that the cluster manager kill every executor on the specified 
host.
+   * @return whether the request is acknowledged by the cluster manager.
+   */
+  def killExecutorsOnHost(host: String): Boolean
--- End diff --

Probably good to specify here what's the behavior regarding "force" and 
"replace", since they're not arguments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16650: [SPARK-16554][CORE] Automatically Kill Executors ...

2017-01-26 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16650#discussion_r98125826
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -139,6 +139,11 @@ package object config {
   .timeConf(TimeUnit.MILLISECONDS)
   .createOptional
 
+  private[spark] val BLACKLIST_KILL_ENABLED =
+ConfigBuilder("spark.blacklist.killBlacklistedExecutors")
+.booleanConf
--- End diff --

nit: indent extra level


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16650: [SPARK-16554][CORE] Automatically Kill Executors ...

2017-01-26 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16650#discussion_r98126123
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
 ---
@@ -148,6 +153,14 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
 if (executorDataMap.contains(executorId)) {
   executorRef.send(RegisterExecutorFailed("Duplicate executor ID: 
" + executorId))
   context.reply(true)
+} else if (scheduler.nodeBlacklist != null &&
+  scheduler.nodeBlacklist.contains(hostname)) {
+  // If the cluster manager gives us an executor on a blacklisted 
node (because it
+  // already started allocating those resources before we informed 
it of our blacklist,
+  // or if it ignored our blacklist), then we reject that executor 
immediately.
+  logInfo(s"Rejecting $executorId as it has been blacklisted.")
+  executorRef.send(RegisterExecutorFailed("Executor is 
blacklisted: " + executorId))
--- End diff --

nit: use interpolation for consistency.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16716: [SPARK-19378][SS] Ensure continuity of stateOperator and...

2017-01-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16716
  
**[Test build #72059 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72059/testReport)**
 for PR 16716 at commit 
[`884a789`](https://github.com/apache/spark/commit/884a7893f5ae0c388ee79b04f92f62757b8aaea0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16716: [SPARK-19378][SS] Ensure continuity of stateOpera...

2017-01-26 Thread brkyvz
GitHub user brkyvz opened a pull request:

https://github.com/apache/spark/pull/16716

[SPARK-19378][SS] Ensure continuity of stateOperator and eventTime metrics 
even if there is no new data in trigger

## What changes were proposed in this pull request?

In StructuredStreaming, if a new trigger was skipped because no new data 
arrived, we suddenly report nothing for the metrics `stateOperator` and 
`eventTime`. We could however easily report the metrics from `lastExecution` to 
ensure continuity of metrics.

## How was this patch tested?

Regression test in `StreamingQueryStatusAndProgressSuite`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/brkyvz/spark state-agg

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16716.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16716


commit 884a7893f5ae0c388ee79b04f92f62757b8aaea0
Author: Burak Yavuz 
Date:   2017-01-27T00:38:24Z

ready for review




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...

2017-01-26 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/16620
  
@squito 
Could you please take another look at this ? : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16711: [SPARK-19220][UI] Make redirection to HTTPS apply to all...

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16711
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16714: [SPARK-16333][Core] Enable EventLoggingListener to log l...

2017-01-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16714
  
**[Test build #72058 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72058/testReport)**
 for PR 16714 at commit 
[`f146121`](https://github.com/apache/spark/commit/f1461215c8210a19022d384a1ca95566da1406a8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16711: [SPARK-19220][UI] Make redirection to HTTPS apply to all...

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16711
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72049/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16711: [SPARK-19220][UI] Make redirection to HTTPS apply to all...

2017-01-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16711
  
**[Test build #72049 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72049/testReport)**
 for PR 16711 at commit 
[`b3959ed`](https://github.com/apache/spark/commit/b3959edcf79d21323ec9e107302a401730795cff).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16714: [SPARK-16333][Core] Enable EventLoggingListener to log l...

2017-01-26 Thread jisookim0513
Github user jisookim0513 commented on the issue:

https://github.com/apache/spark/pull/16714
  
Not sure why the second test build failed at PySpark unit tests. I only 
changed the comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16690: [SPARK-19347] ReceiverSupervisorImpl can add block to Re...

2017-01-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16690
  
**[Test build #72057 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72057/testReport)**
 for PR 16690 at commit 
[`3b7e17b`](https://github.com/apache/spark/commit/3b7e17ba3a2db650b83b0f8d161754bfe53ca31a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16706: [SPARK-19365][Core]Optimize RequestMessage serialization

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16706
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16706: [SPARK-19365][Core]Optimize RequestMessage serialization

2017-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16706
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72047/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16706: [SPARK-19365][Core]Optimize RequestMessage serial...

2017-01-26 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16706#discussion_r98121685
  
--- Diff: core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala 
---
@@ -501,34 +498,99 @@ private[netty] class NettyRpcEndpointRef(
 out.defaultWriteObject()
   }
 
-  override def name: String = _name
+  override def name: String = endpointAddress.name
 
   override def ask[T: ClassTag](message: Any, timeout: RpcTimeout): 
Future[T] = {
-nettyEnv.ask(RequestMessage(nettyEnv.address, this, message), timeout)
+nettyEnv.ask(new RequestMessage(nettyEnv.address, this, message), 
timeout)
   }
 
   override def send(message: Any): Unit = {
 require(message != null, "Message is null")
-nettyEnv.send(RequestMessage(nettyEnv.address, this, message))
+nettyEnv.send(new RequestMessage(nettyEnv.address, this, message))
   }
 
-  override def toString: String = s"NettyRpcEndpointRef(${_address})"
-
-  def toURI: URI = new URI(_address.toString)
+  override def toString: String = 
s"NettyRpcEndpointRef(${endpointAddress})"
 
   final override def equals(that: Any): Boolean = that match {
-case other: NettyRpcEndpointRef => _address == other._address
+case other: NettyRpcEndpointRef => endpointAddress == 
other.endpointAddress
 case _ => false
   }
 
-  final override def hashCode(): Int = if (_address == null) 0 else 
_address.hashCode()
+  final override def hashCode(): Int =
+if (endpointAddress == null) 0 else endpointAddress.hashCode()
 }
 
 /**
  * The message that is sent from the sender to the receiver.
+ *
+ * @param senderAddress the sender address. It's `null` if this message is 
from a client
+ *  `NettyRpcEnv`.
+ * @param receiver the receiver of this message.
+ * @param content the message content.
  */
-private[netty] case class RequestMessage(
-senderAddress: RpcAddress, receiver: NettyRpcEndpointRef, content: Any)
+private[netty] class RequestMessage(
+val senderAddress: RpcAddress,
+val receiver: NettyRpcEndpointRef, val content: Any) {
+
+  /** Manually serialize [[RequestMessage]] to minimize the size of bytes. 
*/
+  def serialize(nettyEnv: NettyRpcEnv): ByteBuffer = {
+val bos = new ByteBufferOutputStream()
+val out = new DataOutputStream(bos)
+try {
+  writeRpcAddress(out, senderAddress)
+  writeRpcAddress(out, receiver.address)
+  out.writeUTF(receiver.name)
+  val contentBytes = nettyEnv.serialize(content)
--- End diff --

Hmmm... could you use `JavaSerializerInstance.serializeStream` here instead?

You avoid: extra object allocations in `serialize`, two copies of the 
serialized content in memory, and the extra copy operation below in `out.write`.

You could also use `ObjectOutputStream` directly (it implements 
`DataOutput`) but that makes it difficult to use Kryo later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16706: [SPARK-19365][Core]Optimize RequestMessage serial...

2017-01-26 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16706#discussion_r98121192
  
--- Diff: core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala 
---
@@ -501,34 +498,99 @@ private[netty] class NettyRpcEndpointRef(
 out.defaultWriteObject()
   }
 
-  override def name: String = _name
+  override def name: String = endpointAddress.name
 
   override def ask[T: ClassTag](message: Any, timeout: RpcTimeout): 
Future[T] = {
-nettyEnv.ask(RequestMessage(nettyEnv.address, this, message), timeout)
+nettyEnv.ask(new RequestMessage(nettyEnv.address, this, message), 
timeout)
   }
 
   override def send(message: Any): Unit = {
 require(message != null, "Message is null")
-nettyEnv.send(RequestMessage(nettyEnv.address, this, message))
+nettyEnv.send(new RequestMessage(nettyEnv.address, this, message))
   }
 
-  override def toString: String = s"NettyRpcEndpointRef(${_address})"
-
-  def toURI: URI = new URI(_address.toString)
+  override def toString: String = 
s"NettyRpcEndpointRef(${endpointAddress})"
 
   final override def equals(that: Any): Boolean = that match {
-case other: NettyRpcEndpointRef => _address == other._address
+case other: NettyRpcEndpointRef => endpointAddress == 
other.endpointAddress
 case _ => false
   }
 
-  final override def hashCode(): Int = if (_address == null) 0 else 
_address.hashCode()
+  final override def hashCode(): Int =
+if (endpointAddress == null) 0 else endpointAddress.hashCode()
 }
 
 /**
  * The message that is sent from the sender to the receiver.
+ *
+ * @param senderAddress the sender address. It's `null` if this message is 
from a client
+ *  `NettyRpcEnv`.
+ * @param receiver the receiver of this message.
+ * @param content the message content.
  */
-private[netty] case class RequestMessage(
-senderAddress: RpcAddress, receiver: NettyRpcEndpointRef, content: Any)
+private[netty] class RequestMessage(
+val senderAddress: RpcAddress,
+val receiver: NettyRpcEndpointRef, val content: Any) {
--- End diff --

nit: move `content` to next line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >