[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16464
  
**[Test build #71303 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71303/testReport)**
 for PR 16464 at commit 
[`882c70d`](https://github.com/apache/spark/commit/882c70da32756e7603bd293b2ba010a585fdc0c5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16464
  
**[Test build #71302 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71302/testReport)**
 for PR 16464 at commit 
[`b72592c`](https://github.com/apache/spark/commit/b72592ce02e9a8af518a103ab81a2dfe8a103d51).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16565: [SPARK-17237][SQL][Backport-2.0] Remove backticks in a p...

2017-01-12 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16565
  
LGTM except one comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16512: [SPARK-18335][SPARKR] createDataFrame to support numPart...

2017-01-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16512
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16512: [SPARK-18335][SPARKR] createDataFrame to support numPart...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16512
  
**[Test build #71300 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71300/testReport)**
 for PR 16512 at commit 
[`4fa1998`](https://github.com/apache/spark/commit/4fa19987433be48fa006e86b5f9e140f2c297c1c).
 * This patch **fails R style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16512: [SPARK-18335][SPARKR] createDataFrame to support numPart...

2017-01-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16512
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71300/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16565: [SPARK-17237][SQL][Backport-2.0] Remove backticks in a p...

2017-01-12 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16565
  
I checked the change history. Actually, you also backported 
https://github.com/apache/spark/pull/15111. Could you please update your PR 
description and PR title? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16464
  
**[Test build #71301 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71301/testReport)**
 for PR 16464 at commit 
[`0134a26`](https://github.com/apache/spark/commit/0134a2693f6abfc51d0c11d693b97971072affaa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16512: [SPARK-18335][SPARKR] createDataFrame to support numPart...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16512
  
**[Test build #71300 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71300/testReport)**
 for PR 16512 at commit 
[`4fa1998`](https://github.com/apache/spark/commit/4fa19987433be48fa006e86b5f9e140f2c297c1c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16565: [SPARK-17237][SQL][Backport-2.0] Remove backticks in a p...

2017-01-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16565
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71290/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16565: [SPARK-17237][SQL][Backport-2.0] Remove backticks in a p...

2017-01-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16565
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16565: [SPARK-17237][SQL][Backport-2.0] Remove backticks in a p...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16565
  
**[Test build #71290 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71290/consoleFull)**
 for PR 16565 at commit 
[`e2c2fae`](https://github.com/apache/spark/commit/e2c2fae70204a2f5891fdfd8d516c273b2d72648).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

2017-01-12 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/16464#discussion_r95948100
  
--- Diff: R/pkg/R/mllib_clustering.R ---
@@ -404,11 +411,14 @@ setMethod("summary", signature(object = "LDAModel"),
 vocabSize <- callJMethod(jobj, "vocabSize")
 topics <- dataFrame(callJMethod(jobj, "topics", 
maxTermsPerTopic))
 vocabulary <- callJMethod(jobj, "vocabulary")
+trainingLogLikelihood <- callJMethod(jobj, 
"trainingLogLikelihood")
+logPrior <- callJMethod(jobj, "logPrior")
--- End diff --

I think it's more appropriate to return ```NULL``` rather than ```NaN``` 
for local LDA model, since the ```logPrior``` is not existing rather than not a 
number.
BTW, I think we can return NULL directly according to ```isDistributed```, 
otherwise, call corresponding Scala methods. This should reduce the complexity 
of ```LDAWrapper``` and reduce communication between R and Scala.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

2017-01-12 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/16464#discussion_r95948289
  
--- Diff: R/pkg/R/mllib_clustering.R ---
@@ -388,6 +388,13 @@ setMethod("spark.lda", signature(data = 
"SparkDataFrame"),
 #' \item{\code{topics}}{top 10 terms and their weights of all 
topics}
 #' \item{\code{vocabulary}}{whole terms of the training corpus, 
NULL if libsvm format file
 #'   used as training set}
+#' \item{\code{trainingLogLikelihood}}{Log likelihood of the 
observed tokens in the training set,
+#'   given the current parameter estimates:
+#'   log P(docs | topics, topic distributions for docs, 
Dirichlet hyperparameters)
+#'   It is only for \code{DistributedLDAModel} (i.e., 
optimizer = "em")}
--- End diff --

```\code{DistributedLDAModel}``` should convert to text description, since 
there is no class called ```DistributedLDAModel``` in SparkR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16555: [SPARK-19180][SQL] the offset of short should be 2 in Of...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16555
  
**[Test build #71299 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71299/testReport)**
 for PR 16555 at commit 
[`7722c4e`](https://github.com/apache/spark/commit/7722c4e233a3ecb6d50db73e8a4040c1ab7dd1b2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16555: [SPARK-19180][SQL] the offset of short should be 2 in Of...

2017-01-12 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16555
  
cc @sameeragarwal @davies @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16555: [SPARK-19180][SQL] the offset of short should be 2 in Of...

2017-01-12 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16555
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16561: [SPARK-18209][SQL][FOLLOWUP] Alias the view with ...

2017-01-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16561#discussion_r95947533
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala 
---
@@ -29,40 +29,31 @@ import org.apache.spark.sql.catalyst.rules.Rule
 
 /**
  * Make sure that a view's child plan produces the view's output 
attributes. We wrap the child
- * with a Project and add an alias for each output attribute. The 
attributes are resolved by
- * name. This should be only done after the batch of Resolution, because 
the view attributes are
- * not completely resolved during the batch of Resolution.
+ * with a Project and add an alias for each output attribute by mapping 
the child output by index,
+ * if the view output doesn't have the same number of columns with the 
child output, throw an
+ * AnalysisException.
+ * This should be only done after the batch of Resolution, because the 
view attributes are not
+ * completely resolved during the batch of Resolution.
  */
 case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] {
   override def apply(plan: LogicalPlan): LogicalPlan = plan 
resolveOperators {
 case v @ View(_, output, child) if child.resolved =>
-  val resolver = conf.resolver
-  val newOutput = output.map { attr =>
-val originAttr = findAttributeByName(attr.name, child.output, 
resolver)
-// The dataType of the output attributes may be not the same with 
that of the view output,
-// so we should cast the attribute to the dataType of the view 
output attribute. If the
-// cast can't perform, will throw an AnalysisException.
-Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = 
attr.exprId,
-  qualifier = attr.qualifier, explicitMetadata = 
Some(attr.metadata))
+  if (output.length != child.output.length) {
+throw new AnalysisException(
+  s"The view output ${output.mkString("[", ",", "]")} doesn't have 
the same number of " +
+s"columns with the child output ${child.output.mkString("[", 
",", "]")}")
+  }
+  val newOutput = output.zip(child.output).map {
+case (attr, originAttr) =>
+  if (attr.dataType != originAttr.dataType) {
--- End diff --

cc @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16561: [SPARK-18209][SQL][FOLLOWUP] Alias the view with ...

2017-01-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16561#discussion_r95947477
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala 
---
@@ -29,40 +29,31 @@ import org.apache.spark.sql.catalyst.rules.Rule
 
 /**
  * Make sure that a view's child plan produces the view's output 
attributes. We wrap the child
- * with a Project and add an alias for each output attribute. The 
attributes are resolved by
- * name. This should be only done after the batch of Resolution, because 
the view attributes are
- * not completely resolved during the batch of Resolution.
+ * with a Project and add an alias for each output attribute by mapping 
the child output by index,
+ * if the view output doesn't have the same number of columns with the 
child output, throw an
+ * AnalysisException.
+ * This should be only done after the batch of Resolution, because the 
view attributes are not
+ * completely resolved during the batch of Resolution.
  */
 case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] {
   override def apply(plan: LogicalPlan): LogicalPlan = plan 
resolveOperators {
 case v @ View(_, output, child) if child.resolved =>
-  val resolver = conf.resolver
-  val newOutput = output.map { attr =>
-val originAttr = findAttributeByName(attr.name, child.output, 
resolver)
-// The dataType of the output attributes may be not the same with 
that of the view output,
-// so we should cast the attribute to the dataType of the view 
output attribute. If the
-// cast can't perform, will throw an AnalysisException.
-Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = 
attr.exprId,
-  qualifier = attr.qualifier, explicitMetadata = 
Some(attr.metadata))
+  if (output.length != child.output.length) {
+throw new AnalysisException(
+  s"The view output ${output.mkString("[", ",", "]")} doesn't have 
the same number of " +
+s"columns with the child output ${child.output.mkString("[", 
",", "]")}")
+  }
+  val newOutput = output.zip(child.output).map {
+case (attr, originAttr) =>
+  if (attr.dataType != originAttr.dataType) {
--- End diff --

```
hive> explain extended select * from testview;
OK
ABSTRACT SYNTAX TREE:
  
TOK_QUERY
   TOK_FROM
  TOK_TABREF
 TOK_TABNAME
testview
   TOK_INSERT
  TOK_DESTINATION
 TOK_DIR
TOK_TMP_FILE
  TOK_SELECT
 TOK_SELEXPR
TOK_ALLCOLREF


STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
TableScan
  alias: testtable
  Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE 
Column stats: NONE
  GatherStats: false
  Select Operator
expressions: a (type: bigint), b (type: tinyint)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE 
Column stats: NONE
ListSink
```

**`expressions: a (type: bigint), b (type: tinyint)`**. I tried to alter 
the columns in the underlying tables to different types. I can see the types of 
view columns are always casted to the same one as the altered one





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16566: [SparkR]: add bisecting kmeans R wrapper

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16566
  
**[Test build #71298 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71298/testReport)**
 for PR 16566 at commit 
[`2ad596e`](https://github.com/apache/spark/commit/2ad596e6f9adb0c3b037c3cc1a379c0019167f08).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16561: [SPARK-18209][SQL][FOLLOWUP] Alias the view with ...

2017-01-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16561#discussion_r95947257
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala 
---
@@ -29,40 +29,31 @@ import org.apache.spark.sql.catalyst.rules.Rule
 
 /**
  * Make sure that a view's child plan produces the view's output 
attributes. We wrap the child
- * with a Project and add an alias for each output attribute. The 
attributes are resolved by
- * name. This should be only done after the batch of Resolution, because 
the view attributes are
- * not completely resolved during the batch of Resolution.
+ * with a Project and add an alias for each output attribute by mapping 
the child output by index,
+ * if the view output doesn't have the same number of columns with the 
child output, throw an
+ * AnalysisException.
+ * This should be only done after the batch of Resolution, because the 
view attributes are not
+ * completely resolved during the batch of Resolution.
  */
 case class AliasViewChild(conf: CatalystConf) extends Rule[LogicalPlan] {
   override def apply(plan: LogicalPlan): LogicalPlan = plan 
resolveOperators {
 case v @ View(_, output, child) if child.resolved =>
-  val resolver = conf.resolver
-  val newOutput = output.map { attr =>
-val originAttr = findAttributeByName(attr.name, child.output, 
resolver)
-// The dataType of the output attributes may be not the same with 
that of the view output,
-// so we should cast the attribute to the dataType of the view 
output attribute. If the
-// cast can't perform, will throw an AnalysisException.
-Alias(Cast(originAttr, attr.dataType), attr.name)(exprId = 
attr.exprId,
-  qualifier = attr.qualifier, explicitMetadata = 
Some(attr.metadata))
+  if (output.length != child.output.length) {
+throw new AnalysisException(
+  s"The view output ${output.mkString("[", ",", "]")} doesn't have 
the same number of " +
+s"columns with the child output ${child.output.mkString("[", 
",", "]")}")
+  }
+  val newOutput = output.zip(child.output).map {
+case (attr, originAttr) =>
+  if (attr.dataType != originAttr.dataType) {
--- End diff --

It sounds like Hive just forcefully cast it.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16558: Fix missing close-parens for In filter's toString

2017-01-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16558


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16558: Fix missing close-parens for In filter's toString

2017-01-12 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16558
  
Alright i'm going to merge this given JIRA is down ... merging in 
master/branch-2.1/branch-2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16568: [SPARK-18971][Core]Upgrade Netty to 4.0.43.Final

2017-01-12 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16568
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16568: [SPARK-18971][Core]Upgrade Netty to 4.0.43.Final

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16568
  
**[Test build #71297 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71297/testReport)**
 for PR 16568 at commit 
[`cb0c6ce`](https://github.com/apache/spark/commit/cb0c6ce950373c7b8d1191282170e27f96ddd2bf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16550: [SPARK-19178][SQL] convert string of large numbers to in...

2017-01-12 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16550
  
Thanks! Merging to master. 

This JIRA is targeting to 2.2.0. Should we merge it to Spark 2.1? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16555: [SPARK-19180][SQL] the offset of short should be 2 in Of...

2017-01-12 Thread yucai
Github user yucai commented on the issue:

https://github.com/apache/spark/pull/16555
  
retest please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16550: [SPARK-19178][SQL] convert string of large number...

2017-01-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16550


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16500: [SPARK-19120] Refresh Metadata Cache After Loading Hive ...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16500
  
**[Test build #71296 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71296/testReport)**
 for PR 16500 at commit 
[`203e36c`](https://github.com/apache/spark/commit/203e36c80fb967ed0ba21ec51942bd5bb17cca7d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16568: [SPARK-18971][Core]Upgrade Netty to 4.0.43.Final

2017-01-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16568
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71294/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16568: [SPARK-18971][Core]Upgrade Netty to 4.0.43.Final

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16568
  
**[Test build #71294 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71294/testReport)**
 for PR 16568 at commit 
[`93f3a41`](https://github.com/apache/spark/commit/93f3a414c41887d7be6938491f2fb70badfe95c7).
 * This patch **fails build dependency tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16568: [SPARK-18971][Core]Upgrade Netty to 4.0.43.Final

2017-01-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16568
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16568: [SPARK-18971][Core]Upgrade Netty to 4.0.43.Final

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16568
  
**[Test build #71294 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71294/testReport)**
 for PR 16568 at commit 
[`93f3a41`](https://github.com/apache/spark/commit/93f3a414c41887d7be6938491f2fb70badfe95c7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16500: [SPARK-19120] Refresh Metadata Cache After Loading Hive ...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16500
  
**[Test build #71295 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71295/testReport)**
 for PR 16500 at commit 
[`11507cc`](https://github.com/apache/spark/commit/11507ccebd9c48b2e340e5c7baf5b4e0a81c771b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16568: [SPARK-18971][Core]Upgrade Netty to 4.0.43.Final

2017-01-12 Thread zsxwing
GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/16568

[SPARK-18971][Core]Upgrade Netty to 4.0.43.Final

## What changes were proposed in this pull request?

Upgrade Netty to 4.0.43.Final to add the fix for 
https://github.com/netty/netty/issues/6153 

## How was this patch tested?

Jenkins

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark SPARK-18971

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16568.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16568


commit 93f3a414c41887d7be6938491f2fb70badfe95c7
Author: Shixiong Zhu 
Date:   2017-01-13T06:45:21Z

Upgrade Netty to 4.0.43.Final




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16555: [SPARK-19180][SQL] the offset of short should be 4 in Of...

2017-01-12 Thread yucai
Github user yucai commented on the issue:

https://github.com/apache/spark/pull/16555
  
Thanks @aray , good catch!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16500: [SPARK-19120] Refresh Metadata Cache After Loadin...

2017-01-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16500#discussion_r95944437
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -392,7 +392,9 @@ case class InsertIntoHiveTable(
 
 // Invalidate the cache.
 sqlContext.sharedState.cacheManager.invalidateCache(table)
-
sqlContext.sessionState.catalog.refreshTable(table.catalogTable.identifier)
+if (partition.nonEmpty) {
+  
sqlContext.sessionState.catalog.refreshTable(table.catalogTable.identifier)
+}
--- End diff --

Agree


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10238: [SPARK-2750][WEB UI] Add https support to the Web UI

2017-01-12 Thread LizzyMiao
Github user LizzyMiao commented on the issue:

https://github.com/apache/spark/pull/10238
  
@vanzin @WangTaoTheTonic  @scwf   can you provide a doc or something that 
we can follow to use https for our spark web ui?  Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16523: [SPARK-19142][SparkR]:spark.kmeans should take se...

2017-01-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16523


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16550: [SPARK-19178][SQL] convert string of large numbers to in...

2017-01-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16550
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71287/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16523: [SPARK-19142][SparkR]:spark.kmeans should take seed, ini...

2017-01-12 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/16523
  
LGTM, merged into master. Thanks. We can not update JIRA since it's 
currently down for maintenance, will do later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16550: [SPARK-19178][SQL] convert string of large numbers to in...

2017-01-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16550
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16567: [SPARK-19113][SS][Tests] Ignore StreamingQueryException ...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16567
  
**[Test build #71293 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71293/testReport)**
 for PR 16567 at commit 
[`e5ed096`](https://github.com/apache/spark/commit/e5ed096bbf719bcd34d36e31485f939b633f43f4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16550: [SPARK-19178][SQL] convert string of large numbers to in...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16550
  
**[Test build #71287 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71287/testReport)**
 for PR 16550 at commit 
[`7448e8c`](https://github.com/apache/spark/commit/7448e8cff72c4510ab1b6f341c587a403779d5e9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16567: [SPARK-19113][SS][Tests] Ignore StreamingQueryExc...

2017-01-12 Thread zsxwing
GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/16567

[SPARK-19113][SS][Tests] Ignore StreamingQueryException thrown from 
awaitInitialization to avoid breaking tests

## What changes were proposed in this pull request?

`StreamExecution.awaitInitialization` may throw fatal errors and fail the 
test. This PR just ignores `StreamingQueryException` thrown from 
`awaitInitialization` so that we can verify the exception in the 
`ExpectFailure` action later. It's fine since `StopStream` or `ExpectFailure` 
will catch `StreamingQueryException` as well.

## How was this patch tested?

Jenkins


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark SPARK-19113-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16567.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16567


commit e5ed096bbf719bcd34d36e31485f939b633f43f4
Author: Shixiong Zhu 
Date:   2017-01-13T06:22:41Z

Ignore exception from awaitInitialization to avoid breaking tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16564: [SPARK-19065][SQL]Don't inherit expression id in ...

2017-01-12 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/16564#discussion_r95942753
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
@@ -898,11 +899,15 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
   (1, 2), (1, 1), (2, 1), (2, 2))
   }
 
-  test("dropDuplicates should not change child plan output") {
-val ds = Seq(("a", 1), ("a", 2), ("b", 1), ("a", 1)).toDS()
-checkDataset(
-  ds.dropDuplicates("_1").select(ds("_1").as[String], 
ds("_2").as[Int]),
-  ("a", 1), ("b", 1))
+  test("SPARK-19065 dropDuplicates should not create expressions using the 
same id") {
--- End diff --

This may introduce other unknown issues because I saw right now SQL rules 
that replace attributes don't deal with `Alias`s.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16564: [SPARK-19065][SQL]Don't inherit expression id in ...

2017-01-12 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/16564#discussion_r95942576
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
@@ -898,11 +899,15 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
   (1, 2), (1, 1), (2, 1), (2, 2))
   }
 
-  test("dropDuplicates should not change child plan output") {
-val ds = Seq(("a", 1), ("a", 2), ("b", 1), ("a", 1)).toDS()
-checkDataset(
-  ds.dropDuplicates("_1").select(ds("_1").as[String], 
ds("_2").as[Int]),
-  ("a", 1), ("b", 1))
+  test("SPARK-19065 dropDuplicates should not create expressions using the 
same id") {
--- End diff --

It's in my fist commit: 
https://github.com/apache/spark/pull/16564/commits/13f54a93c0cf31a38455e90aec722e890af980c6

I removed it because it's not a Structured Streaming issue.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15324: [SPARK-16872][ML] Gaussian Naive Bayes Classifier

2017-01-12 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/15324
  
@jkbradley What's your opinion about whether GNB should be a separated 
Classifier or a modeltype in existing NB?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...

2017-01-12 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/15671
  
@jkbradley Updated. Thanks for reviewing!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...

2017-01-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15505
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71286/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...

2017-01-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15505
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15505
  
**[Test build #71286 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71286/testReport)**
 for PR 15505 at commit 
[`a4499a8`](https://github.com/apache/spark/commit/a4499a8da953d55b8909c1d17df794ca3f357c17).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2017-01-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16355
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71289/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2017-01-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16355
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16355
  
**[Test build #71289 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71289/testReport)**
 for PR 16355 at commit 
[`138ab34`](https://github.com/apache/spark/commit/138ab3478fb8b0f4f4569bb3b0e66c04d3d5cac1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16559: [WIP] Add expression index and test cases

2017-01-12 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16559
  
we already have `GetArrayItem` and `GetMapValue`, and we have special 
parser rules to support it, e.g. `SELECT array_col[3], map_co['key']`. We can 
just treat `index` as an alias if `UnresolvedExtractValue`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16528: [SPARK-19148][SQL] do not expose the external table conc...

2017-01-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16528
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16528: [SPARK-19148][SQL] do not expose the external table conc...

2017-01-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16528
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71292/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16528: [SPARK-19148][SQL] do not expose the external table conc...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16528
  
**[Test build #71292 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71292/testReport)**
 for PR 16528 at commit 
[`2e1d378`](https://github.com/apache/spark/commit/2e1d378456011269ddb1fc451aaa9221ce4996a9).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16542: [SPARK-18905][STREAMING] Fix the issue of removing a fai...

2017-01-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16542
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71288/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16528: [SPARK-19148][SQL] do not expose the external table conc...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16528
  
**[Test build #71292 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71292/testReport)**
 for PR 16528 at commit 
[`2e1d378`](https://github.com/apache/spark/commit/2e1d378456011269ddb1fc451aaa9221ce4996a9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16542: [SPARK-18905][STREAMING] Fix the issue of removing a fai...

2017-01-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16542
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16542: [SPARK-18905][STREAMING] Fix the issue of removing a fai...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16542
  
**[Test build #71288 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71288/testReport)**
 for PR 16542 at commit 
[`465ccc6`](https://github.com/apache/spark/commit/465ccc68368da50579c10fa1daf7f46809411670).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...

2017-01-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16503
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71284/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...

2017-01-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16503
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16503
  
**[Test build #71284 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71284/testReport)**
 for PR 16503 at commit 
[`aba406d`](https://github.com/apache/spark/commit/aba406d4833e7f01040a01f1d6e2b368da852f92).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16557: [SPARK-18693][ML][MLLIB][WIP] ML Evaluators should use w...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16557
  
**[Test build #71291 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71291/testReport)**
 for PR 16557 at commit 
[`397c26b`](https://github.com/apache/spark/commit/397c26b3498eed775621a83f122d1b2b517ba0ab).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16481: [SPARK-19092] [SQL] Save() API of DataFrameWriter should...

2017-01-12 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16481
  
Sure, will do it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16565: [SPARK-17237][SQL][Backport-2.0] Remove backticks in a p...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16565
  
**[Test build #71290 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71290/consoleFull)**
 for PR 16565 at commit 
[`e2c2fae`](https://github.com/apache/spark/commit/e2c2fae70204a2f5891fdfd8d516c273b2d72648).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16500: [SPARK-19120] [SPARK-19121] Refresh Metadata Cach...

2017-01-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16500#discussion_r95939337
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -392,7 +392,9 @@ case class InsertIntoHiveTable(
 
 // Invalidate the cache.
 sqlContext.sharedState.cacheManager.invalidateCache(table)
-
sqlContext.sessionState.catalog.refreshTable(table.catalogTable.identifier)
+if (partition.nonEmpty) {
+  
sqlContext.sessionState.catalog.refreshTable(table.catalogTable.identifier)
+}
--- End diff --

let's revert it first, we should think about cache and refresh more 
thorough later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16512: [SPARK-18335][SPARKR] createDataFrame to support ...

2017-01-12 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16512#discussion_r95939102
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -196,6 +196,12 @@ test_that("create DataFrame from RDD", {
   expect_equal(dtypes(df), list(c("name", "string"), c("age", "int"), 
c("height", "float")))
   expect_equal(as.list(collect(where(df, df$name == "John"))),
list(name = "John", age = 19L, height = 176.5))
+  expect_equal(getNumPartitions(toRDD(df)), 1)
--- End diff --

And so we this subtlety is significant we could change to this. It's a 
slightly more involved change but it would match Scala exactly.

```
splits <-
unlist(lapply(0: (numSlices - 1), function(x) {
   start <- trunc((x * length)/numSlices)
   end <- trunc(((x + 1) * length)/numSlices)
   rep(start, end - start)
}))
```

And you get this sequence for length <- 50, numSlices <- 22
```
 [1]  0  0  2  2  4  4  6  6  6  9  9 11 11 13 13 15 15 15 18 18 20 20 22 
22 22
[26] 25 25 27 27 29 29 31 31 31 34 34 36 36 38 38 40 40 40 43 43 45 45 47 
47 47
```

For calling split() with this sequence is used as.factor - so the numeric 
values are not significant



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16512: [SPARK-18335][SPARKR] createDataFrame to support ...

2017-01-12 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16512#discussion_r95938844
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -196,6 +196,12 @@ test_that("create DataFrame from RDD", {
   expect_equal(dtypes(df), list(c("name", "string"), c("age", "int"), 
c("height", "float")))
   expect_equal(as.list(collect(where(df, df$name == "John"))),
list(name = "John", age = 19L, height = 176.5))
+  expect_equal(getNumPartitions(toRDD(df)), 1)
--- End diff --

Ops I thought we were talking about `numSlices`. Great point about 
`positions`, and here're what I'm seeing (it's going to be a bit long)

```
postions(50, 20)
(0,2)   0 1
(2,5)   2 3 4
(5,7)   5 6 
(7,10)  7 8 9
(10,12) 10 11
(12,15) 12 13 14
(15,17) 15 16
(17,20) 17 18 19
(20,22) 20 21
(22,25) 22 23 24
(25,27) 25 26
(27,30) 27 28 29
(30,32) 30 31
(32,35) 32 33 34
(35,37) 35 36 
(37,40) 37 38 39
(40,42) 40 41
(42,45) 42 43 44
(45,47) 45 46
(47,50) 47 48 49 

sort(rep(1: 20, each = 1, length.out = 50))
 [1]  1  1  1  2  2  2  3  3  3  4  4  4  5  5  5  6  6  6  7  7  7  8  8  
8  9
[26]  9  9 10 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17 18 18 19 19 
20 20
```

As you can see, `positions` attempts to evenly distribute the "extras".

```
positions(50, 24)
(0,2)   0 1
(2,4)   2 3
(4,6)   4 5
(6,8)   6 7
(8,10)  8 9
(10,12) 10 11
(12,14) 12 13
(14,16) 14 15
(16,18) 16 17
(18,20) 18 19
(20,22) 20 21
(22,25) 22 23 24
(25,27) 25 26
(27,29) 27 28
(29,31) 29 30
(31,33) 31 32
(33,35) 33 34
(35,37) 35 36
(37,39) 37 38
(39,41) 39 40
(41,43) 41 42
(43,45) 43 44
(45,47) 45 46
(47,50) 47 48 49

 sort(rep(1: 24, each = 1, length.out = 50))
 [1]  1  1  1  2  2  2  3  3  4  4  5  5  6  6  7  7  8  8  9  9 10 10 11 
11 12
[26] 12 13 13 14 14 15 15 16 16 17 17 18 18 19 19 20 20 21 21 22 22 23 23 
24 24
```

You see if there're only 2, it puts one in the middle and one at the end.

```
positions(50, 22)
(0,2)   0 1
(2,4)   2 3
(4,6)   4 5
(6,9)   6 7 8
(9,11)  9 10
(11,13) 11 12
(13,15) 13 14
(15,18) 15 16 17
(18,20) 18 19
(20,22) 20 21
(22,25) 22 23 24
(25,27) 25 26
(27,29) 27 28
(29,31) 29 30
(31,34) 31 32 33
(34,36) 34 35
(36,38) 36 37
(38,40) 38 39
(40,43) 40 41 42
(43,45) 43 44
(45,47) 45 46
(47,50) 47 48 49

 sort(rep(1: 22, each = 1, length.out = 50))
 [1]  1  1  1  2  2  2  3  3  3  4  4  4  5  5  5  6  6  6  7  7  8  8  9  
9 10
[26] 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17 18 18 19 19 20 20 21 21 
22 22
```

When there're only a few it is still roughly evenly spaced out.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16564: [SPARK-19065][SQL]Don't inherit expression id in ...

2017-01-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16564#discussion_r95938778
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
@@ -898,11 +899,15 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
   (1, 2), (1, 1), (2, 1), (2, 2))
   }
 
-  test("dropDuplicates should not change child plan output") {
-val ds = Seq(("a", 1), ("a", 2), ("b", 1), ("a", 1)).toDS()
-checkDataset(
-  ds.dropDuplicates("_1").select(ds("_1").as[String], 
ds("_2").as[Int]),
-  ("a", 1), ("b", 1))
+  test("SPARK-19065 dropDuplicates should not create expressions using the 
same id") {
--- End diff --

do you have an end-to-end test to show that using same id when alias will 
cause troubles?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16481: [SPARK-19092] [SQL] Save() API of DataFrameWriter should...

2017-01-12 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16481
  
I'll update JIRA once the service is back.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16481: [SPARK-19092] [SQL] Save() API of DataFrameWriter...

2017-01-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16481


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16481: [SPARK-19092] [SQL] Save() API of DataFrameWriter should...

2017-01-12 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16481
  
LGTM, merging to master!

It conflicts with branch-2.1, can you send a new PR? thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16355
  
**[Test build #71289 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71289/testReport)**
 for PR 16355 at commit 
[`138ab34`](https://github.com/apache/spark/commit/138ab3478fb8b0f4f4569bb3b0e66c04d3d5cac1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2017-01-12 Thread imatiach-msft
Github user imatiach-msft commented on the issue:

https://github.com/apache/spark/pull/16355
  
@jkbradley thanks, I've updated the code based on your latest comments - I 
removed k and the verification for the setters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorith...

2017-01-12 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/16355#discussion_r95937613
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala ---
@@ -160,6 +162,17 @@ object KMeansSuite {
 spark.createDataFrame(rdd)
   }
 
+  def generateSparseData(spark: SparkSession, rows: Int, dim: Int, k: Int, 
seed: Int): DataFrame = {
+val sc = spark.sparkContext
+val random = new Random(seed)
+val nnz = random.nextInt(dim)
+val rdd = sc.parallelize(1 to rows)
+  .map(i => Vectors.sparse(dim, random.shuffle(0 to dim - 1).slice(0, 
nnz).sorted.toArray,
+Array.fill(nnz)(random.nextInt(k).toDouble)))
--- End diff --

done, removed k


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorith...

2017-01-12 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/16355#discussion_r95937532
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala 
---
@@ -51,6 +54,21 @@ class BisectingKMeansSuite
 assert(copiedModel.hasSummary)
   }
 
+  test("SPARK-16473: Verify Bisecting K-Means does not fail in edge case 
where" +
+"one cluster is empty after split") {
+val bkm = new 
BisectingKMeans().setK(k).setMinDivisibleClusterSize(4).setMaxIter(4)
+
+assert(bkm.getK === k)
--- End diff --

done, removed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16566: [SparkR]: add bisecting kmeans R wrapper

2017-01-12 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16566
  
```
* checking Rd \usage sections ... WARNING
Duplicated \argument entries in documentation object 'fitted':
  'object' 'method' '...'
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16500: [SPARK-19120] [SPARK-19121] Refresh Metadata Cach...

2017-01-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16500#discussion_r95937423
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -392,7 +392,9 @@ case class InsertIntoHiveTable(
 
 // Invalidate the cache.
 sqlContext.sharedState.cacheManager.invalidateCache(table)
-
sqlContext.sessionState.catalog.refreshTable(table.catalogTable.identifier)
+if (partition.nonEmpty) {
+  
sqlContext.sessionState.catalog.refreshTable(table.catalogTable.identifier)
+}
--- End diff --

@cloud-fan @ericl @mallman For non-partitioned parquet/orc tables, we 
convert them to the data source tables. Thus, it will not call 
`InsertIntoHiveTable`. 

I know it is a little bit confusing, but I am fine to keep it unchanged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...

2017-01-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15671
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71285/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...

2017-01-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15671
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15671
  
**[Test build #71285 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71285/testReport)**
 for PR 15671 at commit 
[`c8188b0`](https://github.com/apache/spark/commit/c8188b03c49912ab2ee9f7dc0f5aae5a9ddc1a1c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16542: [SPARK-18905][STREAMING] Fix the issue of removing a fai...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16542
  
**[Test build #71288 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71288/testReport)**
 for PR 16542 at commit 
[`465ccc6`](https://github.com/apache/spark/commit/465ccc68368da50579c10fa1daf7f46809411670).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16523: [SPARK-19142][SparkR]:spark.kmeans should take seed, ini...

2017-01-12 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16523
  
sounds good! @yanboliang any more comment before we merge?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16542: [SPARK-18905][STREAMING] Fix the issue of removin...

2017-01-12 Thread CodingCat
Github user CodingCat commented on a diff in the pull request:

https://github.com/apache/spark/pull/16542#discussion_r95935489
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobScheduler.scala
 ---
@@ -200,19 +200,19 @@ class JobScheduler(val ssc: StreamingContext) extends 
Logging {
 job.setEndTime(completedTime)
 
listenerBus.post(StreamingListenerOutputOperationCompleted(job.toOutputOperationInfo))
 logInfo("Finished job " + job.id + " from job set of time " + 
jobSet.time)
-if (jobSet.hasCompleted) {
-  jobSets.remove(jobSet.time)
-  jobGenerator.onBatchCompletion(jobSet.time)
-  logInfo("Total delay: %.3f s for time %s (execution: %.3f s)".format(
-jobSet.totalDelay / 1000.0, jobSet.time.toString,
-jobSet.processingDelay / 1000.0
-  ))
-  listenerBus.post(StreamingListenerBatchCompleted(jobSet.toBatchInfo))
-}
 job.result match {
   case Failure(e) =>
 reportError("Error running job " + job, e)
   case _ =>
+if (jobSet.hasCompleted) {
+  jobSets.remove(jobSet.time)
+  jobGenerator.onBatchCompletion(jobSet.time)
+  logInfo("Total delay: %.3f s for time %s (execution: %.3f 
s)".format(
+jobSet.totalDelay / 1000.0, jobSet.time.toString,
+jobSet.processingDelay / 1000.0
+  ))
+  
listenerBus.post(StreamingListenerBatchCompleted(jobSet.toBatchInfo))
--- End diff --

sure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16395: [SPARK-17075][SQL] implemented filter estimation

2017-01-12 Thread ron8hu
Github user ron8hu commented on a diff in the pull request:

https://github.com/apache/spark/pull/16395#discussion_r95935452
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -116,6 +116,12 @@ case class Filter(condition: Expression, child: 
LogicalPlan)
   .filterNot(SubqueryExpression.hasCorrelatedSubquery)
 child.constraints.union(predicates.toSet)
   }
+
+  override lazy val statistics: Statistics = {
--- End diff --

OK.  fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16550: [SPARK-19178][SQL] convert string of large numbers to in...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16550
  
**[Test build #71287 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71287/testReport)**
 for PR 16550 at commit 
[`7448e8c`](https://github.com/apache/spark/commit/7448e8cff72c4510ab1b6f341c587a403779d5e9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15505
  
**[Test build #71286 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71286/testReport)**
 for PR 15505 at commit 
[`a4499a8`](https://github.com/apache/spark/commit/a4499a8da953d55b8909c1d17df794ca3f357c17).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15505: [SPARK-18890][CORE] Move task serialization from ...

2017-01-12 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/15505#discussion_r95933009
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskDescription.scala ---
@@ -52,7 +55,43 @@ private[spark] class TaskDescription(
 val addedFiles: Map[String, Long],
 val addedJars: Map[String, Long],
 val properties: Properties,
-val serializedTask: ByteBuffer) {
+private var serializedTask_ : ByteBuffer) extends  Logging {
--- End diff --

Another implementation:

https://github.com/witgo/spark/commit/4fbf30a568ed61982e17757f9df3c35cb9d64871


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16467: [SPARK-19017][SQL] NOT IN subquery with more than one co...

2017-01-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16467
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16467: [SPARK-19017][SQL] NOT IN subquery with more than one co...

2017-01-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16467
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71283/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16467: [SPARK-19017][SQL] NOT IN subquery with more than one co...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16467
  
**[Test build #71283 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71283/testReport)**
 for PR 16467 at commit 
[`6a1a415`](https://github.com/apache/spark/commit/6a1a4159f54397ef81baaf618e3b816866f589e9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15671
  
**[Test build #71285 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71285/testReport)**
 for PR 15671 at commit 
[`c8188b0`](https://github.com/apache/spark/commit/c8188b03c49912ab2ee9f7dc0f5aae5a9ddc1a1c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...

2017-01-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16503
  
**[Test build #71284 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71284/testReport)**
 for PR 16503 at commit 
[`aba406d`](https://github.com/apache/spark/commit/aba406d4833e7f01040a01f1d6e2b368da852f92).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...

2017-01-12 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/16503
  
@ash211 
Thanks a lot for your comment. I've already fixed the failing Scala style 
tests. Running `./dev/scalastyle` passed. Could you give another look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16547: [SPARK-19168][Structured Streaming] Improvement: filter ...

2017-01-12 Thread lw-lin
Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/16547
  
Thanks for the feedback! Ah, sure, let me update accordingly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16527: [SPARK-19146][Core]Drop more elements when stageData.tas...

2017-01-12 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/16527
  
I use following code log trim stages/jobs time consuming:
```:scala
   /** If stages is too large, remove and garbage collect old stages */
  private def trimStagesIfNecessary(stages: ListBuffer[StageInfo]) = 
synchronized {
if (stages.size > retainedStages) {
  val start = System.currentTimeMillis()
  val toRemove = (stages.size - retainedStages)
  stages.take(toRemove).foreach { s =>
stageIdToData.remove((s.stageId, s.attemptId))
stageIdToInfo.remove(s.stageId)
  }
  stages.trimStart(toRemove)
  logInfo(s"Trim stages time consuming: ${System.currentTimeMillis() - 
start}")
}
  }

  /** If jobs is too large, remove and garbage collect old jobs */
  private def trimJobsIfNecessary(jobs: ListBuffer[JobUIData]) = 
synchronized {
if (jobs.size > retainedJobs) {
  val start = System.currentTimeMillis()
  val toRemove = (jobs.size - retainedJobs)
  jobs.take(toRemove).foreach { job =>
// Remove the job's UI data, if it exists
jobIdToData.remove(job.jobId).foreach { removedJob =>
  // A null jobGroupId is used for jobs that are run without a job 
group
  val jobGroupId = removedJob.jobGroup.orNull
  // Remove the job group -> job mapping entry, if it exists
  jobGroupToJobIds.get(jobGroupId).foreach { jobsInGroup =>
jobsInGroup.remove(job.jobId)
// If this was the last job in this job group, remove the map 
entry for the job group
if (jobsInGroup.isEmpty) {
  jobGroupToJobIds.remove(jobGroupId)
}
  }
}
  }
  jobs.trimStart(toRemove)
  logInfo(s"Trim jobs time consuming: ${System.currentTimeMillis() - 
start}")
}
  }
```
and the result is:
```
tail -f test-time-consuming.log | grep time
17/01/13 10:03:39 INFO JobProgressListener: Trim stages time consuming: 3
17/01/13 10:03:39 INFO JobProgressListener: Trim jobs time consuming: 4
17/01/13 10:03:39 INFO JobProgressListener: Trim stages time consuming: 0
17/01/13 10:03:47 INFO JobProgressListener: Trim stages time consuming: 0
17/01/13 10:03:47 INFO JobProgressListener: Trim jobs time consuming: 0
17/01/13 10:03:47 INFO JobProgressListener: Trim stages time consuming: 0
17/01/13 10:03:56 INFO JobProgressListener: Trim stages time consuming: 1
17/01/13 10:03:56 INFO JobProgressListener: Trim jobs time consuming: 0
17/01/13 10:03:56 INFO JobProgressListener: Trim stages time consuming: 0
17/01/13 10:04:04 INFO JobProgressListener: Trim stages time consuming: 0
17/01/13 10:04:04 INFO JobProgressListener: Trim jobs time consuming: 0
17/01/13 10:04:04 INFO JobProgressListener: Trim stages time consuming: 0
```

It may be fine just change `retainedTasks`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >