[GitHub] spark issue #16828: [SPARK-19484][SQL]continue work to create hive table wit...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16828
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16828: [SPARK-19484][SQL]continue work to create hive table wit...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16828
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72488/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16828: [SPARK-19484][SQL]continue work to create hive table wit...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16828
  
**[Test build #72488 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72488/testReport)**
 for PR 16828 at commit 
[`40f2896`](https://github.com/apache/spark/commit/40f28965f48c2de2cb02eb5a26a52a4ca27bda5b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON p...

2017-02-06 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16750#discussion_r99760937
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -298,6 +299,8 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
* `timestampFormat` (default `-MM-dd'T'HH:mm:ss.SSSZZ`): sets 
the string that
* indicates a timestamp format. Custom date formats follow the formats 
at
* `java.text.SimpleDateFormat`. This applies to timestamp type.
+   * `timeZone` (default session local timezone): sets the string that 
indicates a timezone
--- End diff --

I'd like to use `timeZone` for the option key as the same as 
`spark.sql.session.timeZone` for config key for the session local timezone.
What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON p...

2017-02-06 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16750#discussion_r99760946
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
 ---
@@ -31,10 +31,11 @@ import 
org.apache.spark.sql.catalyst.util.{CaseInsensitiveMap, CompressionCodecs
  * Most of these map directly to Jackson's internal options, specified in 
[[JsonParser.Feature]].
  */
 private[sql] class JSONOptions(
-@transient private val parameters: CaseInsensitiveMap)
+@transient private val parameters: CaseInsensitiveMap, 
defaultTimeZoneId: String)
--- End diff --

I put the `timeZone` option every time creating `JSONOptions` (or 
`CSVOptions`), but there were the same contains-key check logic many times as 
@HyukjinKwon mentioned.
So I modified to pass the default timezone id to `JSONOptions` and 
`CSVOptions`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16740
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72487/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16740
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16740
  
**[Test build #72487 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72487/testReport)**
 for PR 16740 at commit 
[`37c41aa`](https://github.com/apache/spark/commit/37c41aa598bd0c2e3cf9e42f217233498ffdac23).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16809: [SPARK-19463][SQL]refresh cache after the InsertIntoHado...

2017-02-06 Thread windpiger
Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/16809
  
thanks  a lot! It seems that add a REFRESH command is to not modify  the 
default behavior. if user want to refresh, they call the command manually. 

@gatorsmile @sameeragarwal @hvanhovell @davies let us rethink the default 
behavior? It is resonable to refresh after Insert auto, or use refresh manually?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16800#discussion_r99755967
  
--- Diff: R/pkg/inst/tests/testthat/test_mllib_classification.R ---
@@ -27,6 +27,44 @@ absoluteSparkPath <- function(x) {
   file.path(sparkHome, x)
 }
 
+test_that("spark.linearSvc", {
+  df <- suppressWarnings(createDataFrame(iris))
+  training <- df[df$Species %in% c("versicolor", "virginica"), ]
+  model <- model <- spark.linearSvc(training,  Species ~ ., regParam = 
0.01, maxIter = 10)
+  summary <- summary(model)
+  expect_equal(summary$coefficients, list(-0.1563083, -0.460648, 
0.2276626, 1.055085),
+   tolerance = 1e-2)
+  expect_equal(summary$intercept, -0.06004978, tolerance = 1e-2)
+
+  # Test prediction with string label
+  prediction <- predict(model, training)
+  expect_equal(typeof(take(select(prediction, "prediction"), 
1)$prediction), "character")
+  expected <- c("versicolor", "versicolor", "versicolor", "virginica",  
"virginica",
+"virginica",  "virginica",  "virginica",  "virginica",  
"virginica")
+  expect_equal(sort(as.list(take(select(prediction, "prediction"), 
10))[[1]]), expected)
--- End diff --

Add `sort` to make it stable. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16800: [SPARK-19456][SparkR]:Add LinearSVC R API

2017-02-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16800#discussion_r99755912
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LinearSVCWrapper.scala 
---
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.r
+
+import org.apache.hadoop.fs.Path
+import org.json4s._
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods._
+
+import org.apache.spark.ml.{Pipeline, PipelineModel}
+import org.apache.spark.ml.classification.{LinearSVC, LinearSVCModel}
+import org.apache.spark.ml.feature.{IndexToString, RFormula}
+import org.apache.spark.ml.r.RWrapperUtils._
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+
+private[r] class LinearSVCWrapper private (
+val pipeline: PipelineModel,
+val features: Array[String],
+val labels: Array[String]) extends MLWritable {
+  import LinearSVCWrapper._
+
+  private val svcModel: LinearSVCModel =
+pipeline.stages(1).asInstanceOf[LinearSVCModel]
--- End diff --

The last state is id_to_index. So I need to use stages(1) to get the fitted 
model.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16800: [SPARK-19456][SparkR][WIP]:Add LinearSVC R API

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16800
  
**[Test build #72492 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72492/testReport)**
 for PR 16800 at commit 
[`e6eea1d`](https://github.com/apache/spark/commit/e6eea1df1acd3d01ea3c989a6cfe0823a4f99fb2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16800: [SPARK-19456][SparkR][WIP]:Add LinearSVC R API

2017-02-06 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16800#discussion_r99755781
  
--- Diff: R/pkg/R/generics.R ---
@@ -1376,6 +1376,10 @@ setGeneric("spark.kstest", function(data, ...) { 
standardGeneric("spark.kstest")
 #' @export
 setGeneric("spark.lda", function(data, ...) { standardGeneric("spark.lda") 
})
 
+#' @rdname spark.linearSvc
+#' @export
+setGeneric("spark.linearSvc", function(data, formula, ...) { 
standardGeneric("spark.linearSvc") })
--- End diff --

`svmLinear` looks fine. I will change the files tomorrow. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...

2017-02-06 Thread budde
Github user budde commented on the issue:

https://github.com/apache/spark/pull/16797
  
> BTW, what behavior do we expect if a parquet file has two columns whose 
lower-cased names are identical?

I can take a look at how Spark handled this prior to 2.1, although I'm not 
sure if the behavior we'll see there was the result of a conscious decision or 
"undefined" behavior.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...

2017-02-06 Thread budde
Github user budde commented on the issue:

https://github.com/apache/spark/pull/16797
  
> how about we add a new SQL command to refresh the table schema in 
metastore by inferring schema with data files? This is a compatibility issue 
and we should have provided a way for users to migrate, before the 2.1 release. 
I think this approach is much simpler than adding a flag.

While I think introducing a command for inferring and storing the table's 
case-sensitive schema as a property would be a welcome addition, I think 
requiring this property to be there in order for Spark SQL to function properly 
with case-sensitive data files could really restrict the settings Spark SQL can 
be used in.

If a user wanted to use Spark SQL to query over an existing warehouse 
containing hundreds or even thousands of tables, under the suggested approach a 
Spark job would have to be run to infer the schema of each and every table. 
file formats such as Parquet store their schemas as metadata there still could 
potentially be millions of files to inspect for the warehouse. A less amenable 
format like JSON might require scanning all the data in the warehouse.

This also doesn't cover the use case @viirya pointed our where the user may 
not have write access to the metastore they are querying against. In this case, 
the user would have to rely on the warehouse administrator to create the Spark 
schema property for every table they wish to query.

> For tables created by hive, as hive is a case-insensitive system, will 
the parquet files have mixed-case schema?

I think the Hive Metastore has become a bit of an open standard for 
maintaining a data warehouse catalog since so many tools integrate with it. I 
wouldn't assume that the underlying data pointed to by an external metastore 
was created or managed by Hive itself. For example, we maintain a Hive 
Metastore that catalogs case-sensitive files written by our Spark-based ETL 
pipeline, which parses case classes from string data and writes them as Parquet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16376: [SPARK-18967][SCHEDULER] compute locality levels even if...

2017-02-06 Thread kayousterhout
Github user kayousterhout commented on the issue:

https://github.com/apache/spark/pull/16376
  
Awesome always enthusiastic about fixing minor nits!!  I merged this into 
master.  I didn't merge it into 2.1 but I don't feel strongly about it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16376: [SPARK-18967][SCHEDULER] compute locality levels ...

2017-02-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16376


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-02-06 Thread gczsjdy
Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r99753353
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -20,8 +20,12 @@ package org.apache.spark.sql.catalyst.expressions
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.expressions.codegen._
+import org.apache.spark.sql.catalyst.util.TypeUtils
 import org.apache.spark.sql.types._
 
+import scala.annotation.tailrec
--- End diff --

Sorry, it's my fault.
I will run dev/scalastyle first before I push my changes next time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16497: [SPARK-19118] [SQL] Percentile support for frequency dis...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16497
  
**[Test build #72491 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72491/testReport)**
 for PR 16497 at commit 
[`58e7a63`](https://github.com/apache/spark/commit/58e7a635d2d138fe6809f6bb6898322e1ea956e0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16497: [SPARK-19118] [SQL] Percentile support for freque...

2017-02-06 Thread tanejagagan
Github user tanejagagan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16497#discussion_r99753046
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Percentile.scala
 ---
@@ -125,10 +139,17 @@ case class Percentile(
   buffer: OpenHashMap[Number, Long],
   input: InternalRow): OpenHashMap[Number, Long] = {
 val key = child.eval(input).asInstanceOf[Number]
+val frqValue = frequencyExpression.eval(input)
 
 // Null values are ignored in counts map.
-if (key != null) {
-  buffer.changeValue(key, 1L, _ + 1L)
+if (key != null && frqValue != null) {
+  val frqLong = frqValue.asInstanceOf[Number].longValue()
+  // add only when frequency is positive
+  if (frqLong > 0) {
+buffer.changeValue(key, frqLong, _ + frqLong)
+  } else if ( frqLong < 0 ) {
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrary state...

2017-02-06 Thread tdas
Github user tdas commented on the issue:

https://github.com/apache/spark/pull/16758
  
I addressed all the comments. However, @zsxwing, our offline discussion of 
throwing error on `.update(null)` ran into a problem. Since its typed as S, the 
behavior is odd when S is primitive type. See the failing test. When the type 
is Int, `get` return 0 when the state does not exist. That's very 
non-intuitive. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrary state...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16758
  
**[Test build #72490 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72490/testReport)**
 for PR 16758 at commit 
[`b32dcd1`](https://github.com/apache/spark/commit/b32dcd170a55c655c292d2cc535893c02a86c296).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...

2017-02-06 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/16797
  
The proposal to restore schema inference with finer grained control on when 
it is performed sounds reasonable to me. The case I'm most interested in is 
turning off schema inference entirely, because we do not use parquet files with 
upper-case characters in their column names.

BTW, what behavior do we expect if a parquet file has two columns whose 
lower-cased names are identical?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16827: [SPARK-19482][CORE] Fail it if 'spark.master' is set wit...

2017-02-06 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16827
  
Working on UT failure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16827: [SPARK-19482][CORE] Fail it if 'spark.master' is set wit...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16827
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72486/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16827: [SPARK-19482][CORE] Fail it if 'spark.master' is set wit...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16827
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16827: [SPARK-19482][CORE] Fail it if 'spark.master' is set wit...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16827
  
**[Test build #72486 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72486/testReport)**
 for PR 16827 at commit 
[`8198db3`](https://github.com/apache/spark/commit/8198db36574a1eb122c2d2303058b9424006f62e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16828: [SPARK-19484][SQL]continue work to create hive table wit...

2017-02-06 Thread windpiger
Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/16828
  
cc @gatorsmile @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16376: [SPARK-18967][SCHEDULER] compute locality levels even if...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16376
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72484/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16476: [SPARK-19084][SQL] Implement expression field

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16476
  
**[Test build #72489 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72489/testReport)**
 for PR 16476 at commit 
[`a8d22e3`](https://github.com/apache/spark/commit/a8d22e382e07b1bc6acccded13648ab0d22ed255).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16376: [SPARK-18967][SCHEDULER] compute locality levels even if...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16376
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16828: [SPARK-19484][SQL]continue work to create hive table wit...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16828
  
**[Test build #72488 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72488/testReport)**
 for PR 16828 at commit 
[`40f2896`](https://github.com/apache/spark/commit/40f28965f48c2de2cb02eb5a26a52a4ca27bda5b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16376: [SPARK-18967][SCHEDULER] compute locality levels even if...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16376
  
**[Test build #72484 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72484/testReport)**
 for PR 16376 at commit 
[`cf8271e`](https://github.com/apache/spark/commit/cf8271e4c69efa573aabf891d5465736c7c79184).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16828: [SPARK-19484][SQL]continue work to create hive ta...

2017-02-06 Thread windpiger
GitHub user windpiger opened a pull request:

https://github.com/apache/spark/pull/16828

[SPARK-19484][SQL]continue work to create hive table with an empty schema

## What changes were proposed in this pull request?

after [SPARK-19279](https://issues.apache.org/jira/browse/SPARK-19279), we 
could not create a Hive table with an empty schema,
we should tighten up the condition when create a hive table in

https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L835

That is if a CatalogTable t has an empty schema, and (there is no 
`spark.sql.schema.numParts` or its value is 0), we should not add a default 
`col` schema, if we did, a table with an empty schema will be created, that is 
not we expected.

## How was this patch tested?
N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/windpiger/spark 
improveCreateTableWithEmptySchema

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16828.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16828


commit 40f28965f48c2de2cb02eb5a26a52a4ca27bda5b
Author: windpiger 
Date:   2017-02-07T06:00:46Z

[SPARK-19484][SQL]continue work to create hive table with an empty schema




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...

2017-02-06 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/16819
  
It will reduce the function call on 
[CoarseGrainedSchedulerBackend.requestTotalExecutors()](https://github.com/apache/spark/blob/v2.1.0/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L491)
 after apply this PR:

[before](https://github.com/apache/spark/files/756685/doNotApply-PR-16819.txt) 
[after-apply-this-PR](https://github.com/apache/spark/files/756684/apply-PR-16819.txt)

Full log can be found 
[here](https://issues.apache.org/jira/secure/attachment/12851318/SPARK-16441-compare-apply-PR-16819.zip).




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16762: [SPARK-19419] [SPARK-19420] Fix the cross join de...

2017-02-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16762#discussion_r99749089
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
@@ -213,7 +213,12 @@ abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
   }
 // This join could be very slow or OOM
 joins.BroadcastNestedLoopJoinExec(
--- End diff --

when we broadcast, is it still a cross join?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-02-06 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16758#discussion_r99749006
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/KeyedStateImpl.scala
 ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming
+
+import org.apache.spark.sql.KeyedState
+
+/** Internal implementation of the [[KeyedState]] interface */
--- End diff --

i have mentioned that the trait KeyedState is not thread-safe


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-02-06 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16758#discussion_r99749020
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
 ---
@@ -313,6 +313,56 @@ case class MapGroups(
 outputObjAttr: Attribute,
 child: LogicalPlan) extends UnaryNode with ObjectProducer
 
+/** Internal class representing State */
+trait LogicalKeyedState[S]
+
+/** Factory for constructing new `MapGroupsWithState` nodes. */
+object MapGroupsWithState {
+  def apply[K: Encoder, V: Encoder, S: Encoder, U: Encoder](
+  func: (Any, Iterator[Any], LogicalKeyedState[Any]) => Iterator[Any],
+  groupingAttributes: Seq[Attribute],
+  dataAttributes: Seq[Attribute],
+  child: LogicalPlan): LogicalPlan = {
+val mapped = new MapGroupsWithState(
+  func,
+  UnresolvedDeserializer(encoderFor[K].deserializer, 
groupingAttributes),
+  UnresolvedDeserializer(encoderFor[V].deserializer, dataAttributes),
+  groupingAttributes,
+  dataAttributes,
+  CatalystSerde.generateObjAttr[U],
+  encoderFor[S].resolveAndBind().deserializer,
+  encoderFor[S].namedExpressions,
+  child)
+CatalystSerde.serialize[U](mapped)
+  }
+}
+
+/**
+ * Applies func to each unique group in `child`, based on the evaluation 
of `groupingAttributes`,
+ * while using state data.
+ * Func is invoked with an object representation of the grouping key an 
iterator containing the
+ * object representation of all the rows with that key.
+ *
+ * @param keyDeserializer used to extract the key object for each group.
+ * @param valueDeserializer used to extract the items in the iterator from 
an input row.
+ * @param groupingAttributes used to group the data
+ * @param dataAttributes used to read the data
+ * @param outputObjAttr used to define the output object
+ * @param stateDeserializer used to deserialize state before calling `func`
+ * @param stateSerializer used to serialize updated state after calling 
`func`
+ */
+case class MapGroupsWithState(
+func: (Any, Iterator[Any], LogicalKeyedState[Any]) => Iterator[Any],
+keyDeserializer: Expression,
+valueDeserializer: Expression,
+groupingAttributes: Seq[Attribute],
+dataAttributes: Seq[Attribute],
+outputObjAttr: Attribute,
+stateDeserializer: Expression,
+stateSerializer: Seq[NamedExpression],
+child: LogicalPlan) extends UnaryNode with ObjectProducer
+
+
--- End diff --

removed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-02-06 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16758#discussion_r99748948
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyedState.scala ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.annotation.{Experimental, InterfaceStability}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalKeyedState
+
+/**
+ * :: Experimental ::
+ *
+ * Wrapper class for interacting with keyed state data in 
`mapGroupsWithState` and
+ * `flatMapGroupsWithState` operations on
+ * [[KeyValueGroupedDataset]].
+ *
+ * Detail description on `[map/flatMap]GroupsWithState` operation
+ * 
+ * Both, `mapGroupsWithState` and `flatMapGroupsWithState` in 
[[KeyValueGroupedDataset]]
+ * will invoke the user-given function on each group (defined by the 
grouping function in
+ * `Dataset.groupByKey()`) while maintaining user-defined per-group state 
between invocations.
+ * For a static batch Dataset, the function will be invoked once per 
group. For a streaming
+ * Dataset, the function will be invoked for each group repeatedly in 
every trigger.
+ * That is, in every batch of the [[streaming.StreamingQuery 
StreamingQuery]],
+ * the function will be invoked once for each group that has data in the 
batch.
+ *
+ * The function is invoked with following parameters.
+ *  - The key of the group.
+ *  - An iterator containing all the values for this key.
+ *  - A user-defined state object set by previous invocations of the given 
function.
+ * In case of a batch Dataset, there is only invocation and state object 
will be empty as
+ * there is no prior state. Essentially, for batch Datasets, 
`[map/flatMap]GroupsWithState`
+ * is equivalent to `[map/flatMap]Groups`.
+ *
+ * Important points to note about the function.
+ *  - In a trigger, the function will be called only the groups present in 
the batch. So do not
+ *assume that the function will be called in every trigger for every 
group that has state.
+ *  - There is no guaranteed ordering of values in the iterator in the 
function, neither with
+ *batch, nor with streaming Datasets.
+ *  - All the data will be shuffled before applying the function.
+ *
+ * Important points to note about using KeyedState.
+ *  - The value of the state cannot be null. So updating state with null 
is same as removing it.
+ *  - Operations on `KeyedState` are not thread-safe. This is to avoid 
memory barriers.
+ *  - If the `remove()` is called, then `exists()` will return `false`, and
--- End diff --

done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16762: [SPARK-19419] [SPARK-19420] Fix the cross join de...

2017-02-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16762#discussion_r99748912
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoinExec.scala
 ---
@@ -339,6 +340,33 @@ case class BroadcastNestedLoopJoinExec(
 )
   }
 
+  protected override def doPrepare(): Unit = {
+if (!sqlContext.conf.crossJoinEnabled) {
+  joinType match {
+case Cross => // Do nothing
+case Inner =>
--- End diff --

can you add comments to say when we will hit this branch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-02-06 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16758#discussion_r99748834
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
 ---
@@ -46,8 +46,13 @@ object UnsupportedOperationChecker {
 "Queries without streaming sources cannot be executed with 
writeStream.start()")(plan)
 }
 
+/** Collect all the streaming aggregates in a sub plan */
+def collectStreamingAggregates(subplan: LogicalPlan): Seq[Aggregate] = 
{
+  subplan.collect { case a@Aggregate(_, _, _) if a.isStreaming => a }
--- End diff --

done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16762: [SPARK-19419] [SPARK-19420] Fix the cross join de...

2017-02-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16762#discussion_r99748730
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoinExec.scala
 ---
@@ -339,6 +340,33 @@ case class BroadcastNestedLoopJoinExec(
 )
   }
 
+  protected override def doPrepare(): Unit = {
+if (!sqlContext.conf.crossJoinEnabled) {
+  joinType match {
+case Cross => // Do nothing
+case Inner =>
+  if (condition.isEmpty) {
+throw new AnalysisException(
+  s"""Detected cartesian product for INNER join between 
logical plans
+ |${left.treeString(false).trim}
+ |and
+ |${right.treeString(false).trim}
+ |Join condition is missing or trivial.
+ |Use the CROSS JOIN syntax to allow cartesian products 
between these relations.
+   """.stripMargin)
+  }
+case _ if !withinBroadcastThreshold || condition.isEmpty =>
+  throw new AnalysisException(
+s"""Both sides of this join are outside the broadcasting 
threshold and
+   |computing it could be prohibitively expensive. To 
explicitly enable it,
+   |Please set ${SQLConf.CROSS_JOINS_ENABLED.key} = true.
--- End diff --

shall we support `CROSS OUTER JOIN`, `CROSS LEFT JOIN`, etc?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-02-06 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16758#discussion_r99748711
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala
 ---
@@ -235,3 +234,79 @@ case class StateStoreSaveExec(
 
   override def outputPartitioning: Partitioning = child.outputPartitioning
 }
+
+
+/** Physical operator for executing streaming mapGroupsWithState. */
+case class MapGroupsWithStateExec(
+func: (Any, Iterator[Any], LogicalKeyedState[Any]) => Iterator[Any],
+keyDeserializer: Expression,
+valueDeserializer: Expression,
+groupingAttributes: Seq[Attribute],
+dataAttributes: Seq[Attribute],
+outputObjAttr: Attribute,
+stateId: Option[OperatorStateId],
+stateDeserializer: Expression,
+stateSerializer: Seq[NamedExpression],
+child: SparkPlan) extends UnaryExecNode with ObjectProducerExec with 
StateStoreWriter {
+
+  override def outputPartitioning: Partitioning = child.outputPartitioning
+
+  /** Distribute by grouping attributes */
+  override def requiredChildDistribution: Seq[Distribution] =
+ClusteredDistribution(groupingAttributes) :: Nil
+
+  /** Ordering needed for using GroupingIterator */
+  override def requiredChildOrdering: Seq[Seq[SortOrder]] =
+Seq(groupingAttributes.map(SortOrder(_, Ascending)))
+
+  override protected def doExecute(): RDD[InternalRow] = {
+child.execute().mapPartitionsWithStateStore[InternalRow](
+  getStateId.checkpointLocation,
+  operatorId = getStateId.operatorId,
+  storeVersion = getStateId.batchId,
+  groupingAttributes.toStructType,
+  child.output.toStructType,
+  sqlContext.sessionState,
+  Some(sqlContext.streams.stateStoreCoordinator)) { (store, iter) =>
+val numTotalStateRows = longMetric("numTotalStateRows")
+val numUpdatedStateRows = longMetric("numUpdatedStateRows")
+val numOutputRows = longMetric("numOutputRows")
+
+val groupedIter = GroupedIterator(iter, groupingAttributes, 
child.output)
+
+val getKeyObj = 
ObjectOperator.deserializeRowToObject(keyDeserializer, groupingAttributes)
+val getKey = GenerateUnsafeProjection.generate(groupingAttributes, 
child.output)
--- End diff --

done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-02-06 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16758#discussion_r99748668
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala
 ---
@@ -58,6 +58,8 @@ trait StateStore {
*/
   def remove(condition: UnsafeRow => Boolean): Unit
 
+  def remove(key: UnsafeRow): Unit
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-02-06 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16758#discussion_r99748496
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyedState.scala ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.annotation.{Experimental, InterfaceStability}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalKeyedState
+
+/**
+ * :: Experimental ::
+ *
+ * Wrapper class for interacting with keyed state data in 
`mapGroupsWithState` and
+ * `flatMapGroupsWithState` operations on
+ * [[KeyValueGroupedDataset]].
+ *
+ * Detail description on `[map/flatMap]GroupsWithState` operation
+ * 
+ * Both, `mapGroupsWithState` and `flatMapGroupsWithState` in 
[[KeyValueGroupedDataset]]
+ * will invoke the user-given function on each group (defined by the 
grouping function in
+ * `Dataset.groupByKey()`) while maintaining user-defined per-group state 
between invocations.
+ * For a static batch Dataset, the function will be invoked once per 
group. For a streaming
+ * Dataset, the function will be invoked for each group repeatedly in 
every trigger.
+ * That is, in every batch of the [[streaming.StreamingQuery 
StreamingQuery]],
+ * the function will be invoked once for each group that has data in the 
batch.
+ *
+ * The function is invoked with following parameters.
+ *  - The key of the group.
+ *  - An iterator containing all the values for this key.
+ *  - A user-defined state object set by previous invocations of the given 
function.
+ * In case of a batch Dataset, there is only invocation and state object 
will be empty as
+ * there is no prior state. Essentially, for batch Datasets, 
`[map/flatMap]GroupsWithState`
+ * is equivalent to `[map/flatMap]Groups`.
+ *
+ * Important points to note about the function.
+ *  - In a trigger, the function will be called only the groups present in 
the batch. So do not
+ *assume that the function will be called in every trigger for every 
group that has state.
+ *  - There is no guaranteed ordering of values in the iterator in the 
function, neither with
+ *batch, nor with streaming Datasets.
+ *  - All the data will be shuffled before applying the function.
+ *
+ * Important points to note about using KeyedState.
+ *  - The value of the state cannot be null. So updating state with null 
is same as removing it.
+ *  - Operations on `KeyedState` are not thread-safe. This is to avoid 
memory barriers.
+ *  - If the `remove()` is called, then `exists()` will return `false`, and
+ *`getOption()` will return `None`.
+ *  - After that `update(newState)` is called, then `exists()` will return 
`true`,
+ *and `getOption()` will return `Some(...)`.
+ *
+ * Scala example of using `KeyedState` in `mapGroupsWithState`:
+ * {{{
+ * // A mapping function that maintains an integer state for string keys 
and returns a string.
+ * def mappingFunction(key: String, value: Iterable[Int], state: 
KeyedState[Int]): Option[String]= {
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-02-06 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16758#discussion_r99748491
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyedState.scala ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.annotation.{Experimental, InterfaceStability}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalKeyedState
+
+/**
+ * :: Experimental ::
+ *
+ * Wrapper class for interacting with keyed state data in 
`mapGroupsWithState` and
+ * `flatMapGroupsWithState` operations on
+ * [[KeyValueGroupedDataset]].
+ *
+ * Detail description on `[map/flatMap]GroupsWithState` operation
+ * 
+ * Both, `mapGroupsWithState` and `flatMapGroupsWithState` in 
[[KeyValueGroupedDataset]]
+ * will invoke the user-given function on each group (defined by the 
grouping function in
+ * `Dataset.groupByKey()`) while maintaining user-defined per-group state 
between invocations.
+ * For a static batch Dataset, the function will be invoked once per 
group. For a streaming
+ * Dataset, the function will be invoked for each group repeatedly in 
every trigger.
+ * That is, in every batch of the [[streaming.StreamingQuery 
StreamingQuery]],
+ * the function will be invoked once for each group that has data in the 
batch.
+ *
+ * The function is invoked with following parameters.
+ *  - The key of the group.
+ *  - An iterator containing all the values for this key.
+ *  - A user-defined state object set by previous invocations of the given 
function.
+ * In case of a batch Dataset, there is only invocation and state object 
will be empty as
+ * there is no prior state. Essentially, for batch Datasets, 
`[map/flatMap]GroupsWithState`
+ * is equivalent to `[map/flatMap]Groups`.
+ *
+ * Important points to note about the function.
+ *  - In a trigger, the function will be called only the groups present in 
the batch. So do not
+ *assume that the function will be called in every trigger for every 
group that has state.
+ *  - There is no guaranteed ordering of values in the iterator in the 
function, neither with
+ *batch, nor with streaming Datasets.
+ *  - All the data will be shuffled before applying the function.
+ *
+ * Important points to note about using KeyedState.
+ *  - The value of the state cannot be null. So updating state with null 
is same as removing it.
+ *  - Operations on `KeyedState` are not thread-safe. This is to avoid 
memory barriers.
+ *  - If the `remove()` is called, then `exists()` will return `false`, and
+ *`getOption()` will return `None`.
+ *  - After that `update(newState)` is called, then `exists()` will return 
`true`,
+ *and `getOption()` will return `Some(...)`.
+ *
+ * Scala example of using `KeyedState` in `mapGroupsWithState`:
+ * {{{
+ * // A mapping function that maintains an integer state for string keys 
and returns a string.
+ * def mappingFunction(key: String, value: Iterable[Int], state: 
KeyedState[Int]): Option[String]= {
+ *   // Check if state exists
+ *   if (state.exists) {
+ * val existingState = state.get  // Get the existing state
+ * val shouldRemove = ... // Decide whether to remove the state
+ * if (shouldRemove) {
+ *   state.remove() // Remove the state
+ * } else {
+ *   val newState = ...
+ *   state.update(newState)// Set the new state
+ * }
+ *   } else {
+ * val initialState = ...
+ * state.update(initialState)  // Set the initial state
+ *   }
+ *   ... // return something
+ * }
+ *
+ * }}}
+ *
+ * Java example of using `KeyedState`:
+ * {{{
+ * // A mapping function that maintains an integer state for string keys 
and returns a string.
+ * MapGroupsWithStateFunction 

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-02-06 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16758#discussion_r99748329
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyedState.scala ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.annotation.{Experimental, InterfaceStability}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalKeyedState
+
+/**
+ * :: Experimental ::
+ *
+ * Wrapper class for interacting with keyed state data in 
`mapGroupsWithState` and
+ * `flatMapGroupsWithState` operations on
+ * [[KeyValueGroupedDataset]].
+ *
+ * Detail description on `[map/flatMap]GroupsWithState` operation
+ * 
+ * Both, `mapGroupsWithState` and `flatMapGroupsWithState` in 
[[KeyValueGroupedDataset]]
+ * will invoke the user-given function on each group (defined by the 
grouping function in
+ * `Dataset.groupByKey()`) while maintaining user-defined per-group state 
between invocations.
+ * For a static batch Dataset, the function will be invoked once per 
group. For a streaming
+ * Dataset, the function will be invoked for each group repeatedly in 
every trigger.
+ * That is, in every batch of the [[streaming.StreamingQuery 
StreamingQuery]],
+ * the function will be invoked once for each group that has data in the 
batch.
+ *
+ * The function is invoked with following parameters.
+ *  - The key of the group.
+ *  - An iterator containing all the values for this key.
+ *  - A user-defined state object set by previous invocations of the given 
function.
+ * In case of a batch Dataset, there is only invocation and state object 
will be empty as
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-02-06 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16758#discussion_r99748256
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala
 ---
@@ -111,6 +111,25 @@ class UnsupportedOperationsSuite extends SparkFunSuite 
{
 outputMode = Complete,
 expectedMsgs = Seq("distinct aggregation"))
 
+  // MapGroupsWithState: Not supported after a streaming aggregation
+  val att = new AttributeReference(name = "a", dataType = LongType)()
+  assertSupportedInStreamingPlan(
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-02-06 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16758#discussion_r99748308
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyedState.scala ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.annotation.{Experimental, InterfaceStability}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalKeyedState
+
+/**
+ * :: Experimental ::
+ *
+ * Wrapper class for interacting with keyed state data in 
`mapGroupsWithState` and
+ * `flatMapGroupsWithState` operations on
+ * [[KeyValueGroupedDataset]].
+ *
+ * Detail description on `[map/flatMap]GroupsWithState` operation
+ * 
+ * Both, `mapGroupsWithState` and `flatMapGroupsWithState` in 
[[KeyValueGroupedDataset]]
+ * will invoke the user-given function on each group (defined by the 
grouping function in
+ * `Dataset.groupByKey()`) while maintaining user-defined per-group state 
between invocations.
+ * For a static batch Dataset, the function will be invoked once per 
group. For a streaming
+ * Dataset, the function will be invoked for each group repeatedly in 
every trigger.
+ * That is, in every batch of the [[streaming.StreamingQuery 
StreamingQuery]],
+ * the function will be invoked once for each group that has data in the 
batch.
+ *
+ * The function is invoked with following parameters.
+ *  - The key of the group.
+ *  - An iterator containing all the values for this key.
+ *  - A user-defined state object set by previous invocations of the given 
function.
+ * In case of a batch Dataset, there is only invocation and state object 
will be empty as
+ * there is no prior state. Essentially, for batch Datasets, 
`[map/flatMap]GroupsWithState`
+ * is equivalent to `[map/flatMap]Groups`.
+ *
+ * Important points to note about the function.
+ *  - In a trigger, the function will be called only the groups present in 
the batch. So do not
+ *assume that the function will be called in every trigger for every 
group that has state.
+ *  - There is no guaranteed ordering of values in the iterator in the 
function, neither with
+ *batch, nor with streaming Datasets.
+ *  - All the data will be shuffled before applying the function.
+ *
+ * Important points to note about using KeyedState.
+ *  - The value of the state cannot be null. So updating state with null 
is same as removing it.
+ *  - Operations on `KeyedState` are not thread-safe. This is to avoid 
memory barriers.
+ *  - If the `remove()` is called, then `exists()` will return `false`, and
+ *`getOption()` will return `None`.
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-02-06 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16758#discussion_r99748260
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/MapGroupsWithStateSuite.scala
 ---
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming
+
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.spark.SparkException
+import org.apache.spark.sql.KeyedState
+import org.apache.spark.sql.catalyst.streaming.InternalOutputModes._
+import org.apache.spark.sql.execution.streaming.{KeyedStateImpl, 
MemoryStream}
+import org.apache.spark.sql.execution.streaming.state.StateStore
+
+/** Class to check custom state types */
+case class RunningCount(count: Long)
+
+class MapGroupsWithStateSuite extends StreamTest with BeforeAndAfterAll {
+
+  import testImplicits._
+
+  override def afterAll(): Unit = {
+super.afterAll()
+StateStore.stop()
+  }
+
+  test("state - get, exists, update, remove") {
+var state: KeyedStateImpl[String] = null
+
+def testState(
+expectedData: Option[String],
+shouldBeUpdated: Boolean = false,
+shouldBeRemoved: Boolean = false
+  ): Unit = {
+  if (expectedData.isDefined) {
+assert(state.exists)
+assert(state.get === expectedData.get)
+  } else {
+assert(!state.exists)
+assert(state.get === null)
+  }
+  assert(state.isUpdated === shouldBeUpdated)
+  assert(state.isRemoved === shouldBeRemoved)
+}
+
+// Updating empty state
+state = KeyedStateImpl[String](null)
+testState(None)
+state.update("")
+testState(Some(""), shouldBeUpdated = true)
+
+// Updating exiting state
+state = KeyedStateImpl[String]("2")
+testState(Some("2"))
+state.update("3")
+testState(Some("3"), shouldBeUpdated = true)
+
+// Removing state
+state.remove()
+testState(None, shouldBeRemoved = true, shouldBeUpdated = false)
+state.remove()  // should be still callable
+state.update("4")
+testState(Some("4"), shouldBeRemoved = false, shouldBeUpdated = true)
+
+// Updating by null is same as remove
+state.update(null)
+testState(None, shouldBeRemoved = true, shouldBeUpdated = false)
+  }
+
+  test("flatMapGroupsWithState - streaming") {
+// Function to maintain running count up to 2, and then remove the 
count
+// Returns the data and the count if state is defined, otherwise does 
not return anything
+val stateFunc = (key: String, values: Iterator[String], state: 
KeyedState[RunningCount]) => {
+
+  var count = Option(state.get).map(_.count).getOrElse(0L) + 
values.size
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-02-06 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16758#discussion_r99747470
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/MapGroupsWithStateSuite.scala
 ---
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming
+
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.spark.SparkException
+import org.apache.spark.sql.KeyedState
+import org.apache.spark.sql.catalyst.streaming.InternalOutputModes._
+import org.apache.spark.sql.execution.streaming.{KeyedStateImpl, 
MemoryStream}
+import org.apache.spark.sql.execution.streaming.state.StateStore
+
+/** Class to check custom state types */
+case class RunningCount(count: Long)
+
+class MapGroupsWithStateSuite extends StreamTest with BeforeAndAfterAll {
+
+  import testImplicits._
+
+  override def afterAll(): Unit = {
+super.afterAll()
+StateStore.stop()
+  }
+
+  test("state - get, exists, update, remove") {
+var state: KeyedStateImpl[String] = null
+
+def testState(
+expectedData: Option[String],
+shouldBeUpdated: Boolean = false,
+shouldBeRemoved: Boolean = false
+  ): Unit = {
+  if (expectedData.isDefined) {
+assert(state.exists)
+assert(state.get === expectedData.get)
+  } else {
+assert(!state.exists)
+assert(state.get === null)
+  }
+  assert(state.isUpdated === shouldBeUpdated)
+  assert(state.isRemoved === shouldBeRemoved)
+}
+
+// Updating empty state
+state = KeyedStateImpl[String](null)
+testState(None)
+state.update("")
+testState(Some(""), shouldBeUpdated = true)
+
+// Updating exiting state
+state = KeyedStateImpl[String]("2")
+testState(Some("2"))
+state.update("3")
+testState(Some("3"), shouldBeUpdated = true)
+
+// Removing state
+state.remove()
+testState(None, shouldBeRemoved = true, shouldBeUpdated = false)
+state.remove()  // should be still callable
+state.update("4")
+testState(Some("4"), shouldBeRemoved = false, shouldBeUpdated = true)
+
+// Updating by null is same as remove
+state.update(null)
+testState(None, shouldBeRemoved = true, shouldBeUpdated = false)
+  }
+
+  test("flatMapGroupsWithState - streaming") {
+// Function to maintain running count up to 2, and then remove the 
count
+// Returns the data and the count if state is defined, otherwise does 
not return anything
+val stateFunc = (key: String, values: Iterator[String], state: 
KeyedState[RunningCount]) => {
+
+  var count = Option(state.get).map(_.count).getOrElse(0L) + 
values.size
+  if (count == 3) {
+state.remove()
+Iterator.empty
+  } else {
+state.update(RunningCount(count))
+Iterator((key, count.toString))
+  }
+}
+
+val inputData = MemoryStream[String]
+val result =
+  inputData.toDS()
+.groupByKey(x => x)
+.flatMapGroupsWithState(stateFunc) // State: Int, Out: (Str, Str)
+
+testStream(result, Append)(
+  AddData(inputData, "a"),
+  CheckLastBatch(("a", "1")),
+  assertNumStateRows(total = 1, updated = 1),
+  AddData(inputData, "a", "b"),
+  CheckLastBatch(("a", "2"), ("b", "1")),
+  assertNumStateRows(total = 2, updated = 2),
+  StopStream,
+  StartStream(),
+  AddData(inputData, "a", "b"), // should remove state for "a" and not 
return anything for a
+  CheckLastBatch(("b", "2")),
+  assertNumStateRows(total = 1, updated = 2),
+  StopStream,
+  StartStream(),
+  AddData(inputData, "a", "c"), // should recreate state for "a" and 
return count as 1 and
+  

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-02-06 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16758#discussion_r99747455
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/MapGroupsWithStateSuite.scala
 ---
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.streaming
+
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.spark.SparkException
+import org.apache.spark.sql.KeyedState
+import org.apache.spark.sql.catalyst.streaming.InternalOutputModes._
+import org.apache.spark.sql.execution.streaming.{KeyedStateImpl, 
MemoryStream}
+import org.apache.spark.sql.execution.streaming.state.StateStore
+
+/** Class to check custom state types */
+case class RunningCount(count: Long)
+
+class MapGroupsWithStateSuite extends StreamTest with BeforeAndAfterAll {
+
+  import testImplicits._
+
+  override def afterAll(): Unit = {
+super.afterAll()
+StateStore.stop()
+  }
+
+  test("state - get, exists, update, remove") {
+var state: KeyedStateImpl[String] = null
+
+def testState(
+expectedData: Option[String],
+shouldBeUpdated: Boolean = false,
+shouldBeRemoved: Boolean = false
+  ): Unit = {
+  if (expectedData.isDefined) {
+assert(state.exists)
+assert(state.get === expectedData.get)
+  } else {
+assert(!state.exists)
+assert(state.get === null)
+  }
+  assert(state.isUpdated === shouldBeUpdated)
+  assert(state.isRemoved === shouldBeRemoved)
+}
+
+// Updating empty state
+state = KeyedStateImpl[String](null)
+testState(None)
+state.update("")
+testState(Some(""), shouldBeUpdated = true)
+
+// Updating exiting state
+state = KeyedStateImpl[String]("2")
+testState(Some("2"))
+state.update("3")
+testState(Some("3"), shouldBeUpdated = true)
+
+// Removing state
+state.remove()
+testState(None, shouldBeRemoved = true, shouldBeUpdated = false)
+state.remove()  // should be still callable
+state.update("4")
+testState(Some("4"), shouldBeRemoved = false, shouldBeUpdated = true)
+
+// Updating by null is same as remove
+state.update(null)
+testState(None, shouldBeRemoved = true, shouldBeUpdated = false)
+  }
+
+  test("flatMapGroupsWithState - streaming") {
+// Function to maintain running count up to 2, and then remove the 
count
+// Returns the data and the count if state is defined, otherwise does 
not return anything
+val stateFunc = (key: String, values: Iterator[String], state: 
KeyedState[RunningCount]) => {
+
+  var count = Option(state.get).map(_.count).getOrElse(0L) + 
values.size
+  if (count == 3) {
+state.remove()
+Iterator.empty
+  } else {
+state.update(RunningCount(count))
+Iterator((key, count.toString))
+  }
+}
+
+val inputData = MemoryStream[String]
+val result =
+  inputData.toDS()
+.groupByKey(x => x)
+.flatMapGroupsWithState(stateFunc) // State: Int, Out: (Str, Str)
+
+testStream(result, Append)(
+  AddData(inputData, "a"),
+  CheckLastBatch(("a", "1")),
+  assertNumStateRows(total = 1, updated = 1),
+  AddData(inputData, "a", "b"),
+  CheckLastBatch(("a", "2"), ("b", "1")),
+  assertNumStateRows(total = 2, updated = 2),
+  StopStream,
+  StartStream(),
+  AddData(inputData, "a", "b"), // should remove state for "a" and not 
return anything for a
+  CheckLastBatch(("b", "2")),
+  assertNumStateRows(total = 1, updated = 2),
+  StopStream,
+  StartStream(),
+  AddData(inputData, "a", "c"), // should recreate state for "a" and 
return count as 1 and
+  

[GitHub] spark pull request #16758: [SPARK-19413][SS] MapGroupsWithState for arbitrar...

2017-02-06 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16758#discussion_r99747354
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/KeyedStateImpl.scala
 ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming
+
+import org.apache.spark.sql.KeyedState
+
+/** Internal implementation of the [[KeyedState]] interface */
+private[sql] case class KeyedStateImpl[S](private var value: S) extends 
KeyedState[S] {
+  private var updated: Boolean = false  // whether value has been updated 
(but not removed)
+  private var removed: Boolean = false  // whether value has been removed
+
+  // = Public API =
+  override def exists: Boolean = { value != null }
+
+  override def get: S = value
+
+  override def update(newValue: S): Unit = {
+if (newValue == null) {
+  remove()
+} else {
+  value = newValue
+  updated = true
+  removed = false
+}
+  }
+
+  override def remove(): Unit = {
+value = null.asInstanceOf[S]
+updated = false
+removed = true
+  }
+
+  override def toString: String = "KeyedState($value)"
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-02-06 Thread tdas
Github user tdas commented on the issue:

https://github.com/apache/spark/pull/16686
  
@zsxwing please merge if you think your concerns were addressed correctly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16686: [SPARK-18682][SS] Batch Source for Kafka

2017-02-06 Thread tdas
Github user tdas commented on the issue:

https://github.com/apache/spark/pull/16686
  
LGTM!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

2017-02-06 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16762
  
Cross join is a physical concept. In Spark 2.0, we detected it like what 
this PR did. In Spark 2.1, we moved it to Optimizer. Basically, this PR is to 
change it back. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16815: [SPARK-19407][SS] defaultFS is used FileSystem.ge...

2017-02-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16815


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-06 Thread actuaryzhang
Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/16740
  
@sethah Thanks for the comments. 

OK, added more tests to cover all families. It's not possible to test all 
family and link combination if that's what you mean: the tweedie family 
supports a family of links. Now, the link tested includes identity, log, 
inverse, logit and mu^0.4. This should be enough to prevent any non-general 
changes to accidentally pass the tests. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16740: [SPARK-19400][ML] Allow GLM to handle intercept only mod...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16740
  
**[Test build #72487 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72487/testReport)**
 for PR 16740 at commit 
[`37c41aa`](https://github.com/apache/spark/commit/37c41aa598bd0c2e3cf9e42f217233498ffdac23).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16815: [SPARK-19407][SS] defaultFS is used FileSystem.get inste...

2017-02-06 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/16815
  
LGTM. Merging to master and 2.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16827: [SPARK-19482][CORE] Fail it if 'spark.master' is set wit...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16827
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16827: [SPARK-19482][CORE] Fail it if 'spark.master' is set wit...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16827
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72485/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16827: [SPARK-19482][CORE] Fail it if 'spark.master' is set wit...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16827
  
**[Test build #72485 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72485/testReport)**
 for PR 16827 at commit 
[`b50a0bd`](https://github.com/apache/spark/commit/b50a0bd4b6df3aae7a8226e216ab46c1ea393c99).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16827: [SPARK-19482][CORE] Fail it if 'spark.master' is set wit...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16827
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16827: [SPARK-19482][CORE] Fail it if 'spark.master' is set wit...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16827
  
**[Test build #72483 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72483/testReport)**
 for PR 16827 at commit 
[`0cc5997`](https://github.com/apache/spark/commit/0cc599708d213c6aeaf8ad1a748323980444eb15).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16827: [SPARK-19482][CORE] Fail it if 'spark.master' is set wit...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16827
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72483/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16747: SPARK-16636 Add CalendarIntervalType to documentation

2017-02-06 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16747
  
CC @rxin, if we are going to expose `CalendarInterval` and 
`CalendarIntervalType` officially, shall we move `CalendarInterval` to the same 
package as `Decimal`, or create a new class as the external representation of 
`CalendarIntervalType`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16744
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16744
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72481/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16744
  
**[Test build #72481 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72481/testReport)**
 for PR 16744 at commit 
[`5823740`](https://github.com/apache/spark/commit/58237404c1dc755e4085d2cfaff7e629165f70d7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16744
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72480/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16744
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16744
  
**[Test build #72480 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72480/testReport)**
 for PR 16744 at commit 
[`d23365b`](https://github.com/apache/spark/commit/d23365bcb0cefba80bb6b6e56d55e3319f9cf432).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16747: SPARK-16636 Add CalendarIntervalType to documentation

2017-02-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16747
  
Then, It looks okay to me as describing the current state and I just 
checked it after building the doc with this, and also

we can already use it as below:

```scala
scala> sql("SELECT interval 1 second").schema(0).dataType.getClass
res0: Class[_ <: org.apache.spark.sql.types.DataType] = class 
org.apache.spark.sql.types.CalendarIntervalType$

scala> sql("SELECT interval 1 second").collect()(0).get(0).getClass
res1: Class[_] = class org.apache.spark.unsafe.types.CalendarInterval
```

```scala
scala> val rdd = spark.sparkContext.parallelize(Seq(Row(new 
CalendarInterval(0, 0
rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
ParallelCollectionRDD[0] at parallelize at :32

scala> spark.createDataFrame(rdd, StructType(StructField("a", 
CalendarIntervalType) :: Nil))
res1: org.apache.spark.sql.DataFrame = [a: calendarinterval]
```

Another meta concern is, `org.apache.spark.unsafe.types.CalendarInterval` 
seems undocumented in both scaladoc/javadoc (entire `unsafe` module). Once we 
document this as a weak promise for this API, then we might have to keep this 
for backward compatibility.

Maybe just describe it as SQL dedicated type or not supported for now with 
some `Note:` rather than describing it?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16827: [SPARK-19482][CORE] Fail it if 'spark.master' is set wit...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16827
  
**[Test build #72486 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72486/testReport)**
 for PR 16827 at commit 
[`8198db3`](https://github.com/apache/spark/commit/8198db36574a1eb122c2d2303058b9424006f62e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16827: [SPARK-19482][CORE] Fail it if 'spark.master' is set wit...

2017-02-06 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16827
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16827: [SPARK-19482][CORE] Fail it if 'spark.master' is ...

2017-02-06 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16827#discussion_r99743099
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -779,6 +781,30 @@ private[spark] object SparkConf extends Logging {
   }
 
   /**
+   * check if the given config key-value is valid.
+   */
+  def checkValidSetting(key: String, value: String, conf: SparkConf): Unit 
= {
+if (key.equals("spark.master")) {
+  // First, there is no need to set 'spark.master' multi-times with 
different values.
+  // Second, It is possible for users to set the different 
'spark.master' in code with
+  // `spark-submit` command, and will confuse users.
+  // So, we should do once check if the 'spark.master' already exists 
in settings and if
+  // the previous value is the same with current value. Throw a 
IllegalArgumentException when
+  // previous value is different with current value.
+  val previousOne = try {
+Some(conf.get(key))
+  } catch {
+case e: NoSuchElementException =>
+  None
+  }
+  if (previousOne.isDefined && !previousOne.get.equals(value)) {
+throw new IllegalArgumentException(s"'spark.master' should not be 
set with different " +
+  s"value, previous value is ${previousOne.get} and current value 
is $value")
+  }
--- End diff --

here, we choose to throw IllegalArgumentException to fail job. Besides, we 
can also choose to log a warning in a gentle way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16737: [SPARK-19397] [SQL] Make option names of LIBSVM a...

2017-02-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16737#discussion_r99742193
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMOptions.scala ---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.source.libsvm
+
+import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap
+
+/**
+ * Options for the LibSVM data source.
+ */
+private[libsvm] class LibSVMOptions(@transient private val parameters: 
CaseInsensitiveMap)
+  extends Serializable {
+
+  import LibSVMOptions._
+
+  def this(parameters: Map[String, String]) = this(new 
CaseInsensitiveMap(parameters))
+
+  /**
+   * Number of features. If unspecified or nonpositive, the number of 
features will be determined
+   * automatically at the cost of one additional pass.
+   */
+  val numFeatures = parameters.get(NUM_FEATURES).map(_.toInt)
--- End diff --

sure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16737: [SPARK-19397] [SQL] Make option names of LIBSVM a...

2017-02-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16737#discussion_r99742158
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/text/TextSuite.scala
 ---
@@ -125,6 +124,25 @@ class TextSuite extends QueryTest with 
SharedSQLContext {
 }
   }
 
+  test("case insensitive option") {
+val extraOptions = Map[String, String](
+  "mApReDuCe.output.fileoutputformat.compress" -> "true",
--- End diff --

The case is becomes lower cases when we build the DataSource. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...

2017-02-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16795
  
@srowen and @liancheng 
Could you review this PR again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroC...

2017-02-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/16795#discussion_r99740767
  
--- Diff: resource-managers/mesos/pom.xml ---
@@ -49,6 +49,13 @@
 
 
 
+  org.apache.spark
+  spark-tags_${scala.binary.version}
+  test-jar
+  test
+
--- End diff --

This is needed to fix the following error due to 
`org.apache.spark.tags.ExtendedYarnTest`.
```
[info] Running Spark tests using Maven with these arguments:  -Phadoop-2.3 
-Phive -Pyarn -Pmesos -Phive-thriftserver -Pkinesis-asl 
-Dtest.exclude.tags=org.apache.spark.tags.ExtendedHiveTest,org.apache.spark.tags.ExtendedYarnTest
 test --fail-at-end
...
[INFO] Spark Project Mesos  FAILURE [ 
10.687 s]
...
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.19.1:test (default-test) on 
project spark-mesos_2.11: Execution default-test of goal 
org.apache.maven.plugins:maven-surefire-plugin:2.19.1:test failed: There was an 
error in the forked process
[ERROR] java.lang.RuntimeException: Unable to load category: 
org.apache.spark.tags.ExtendedYarnTest
[ERROR] at 
org.apache.maven.surefire.group.match.SingleGroupMatcher.loadGroupClasses(SingleGroupMatcher.java:139)
[ERROR] at ...
[ERROR] Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.tags.ExtendedYarnTest
...
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16815: [SPARK-19407][SS] defaultFS is used FileSystem.get inste...

2017-02-06 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16815
  
yea! - I found this earlier but forgot to track it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16827: [SPARK-19482][CORE] Fail it if 'spark.master' is set wit...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16827
  
**[Test build #72485 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72485/testReport)**
 for PR 16827 at commit 
[`b50a0bd`](https://github.com/apache/spark/commit/b50a0bd4b6df3aae7a8226e216ab46c1ea393c99).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16795
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16795
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72479/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16827: [SPARK-19482][CORE] Fail it if 'spark.master' is ...

2017-02-06 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16827#discussion_r99739390
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -779,6 +781,31 @@ private[spark] object SparkConf extends Logging {
   }
 
   /**
+   * check if the given config key-value is valid.
+   */
+  private def checkValidSetting(key: String, value: String, conf: 
SparkConf): Unit = {
+key match {
+  case "spark.master" =>
+// First, there is no need to set 'spark.master' multi-times with 
different values.
+// Second, It is possible for users to set the different 
'spark.master' in code with
+// `spark-submit` command, and will confuse users.
+// So, we should do once check if the 'spark.master' already 
exists in settings and if
+// the previous value is the same with current value. Throw a 
IllegalArgumentException when
+// previous value is different with current value.
+val previousOne = try {
+  Some(conf.get(key))
+} catch {
+  case e: NoSuchElementException =>
+None
+}
+if (previousOne.isDefined && !previousOne.get.equals(value)) {
+  throw new IllegalArgumentException(s"'spark.master' should not 
be set with different " +
+s"value, previous value is ${previousOne.get} and current 
value is $value")
+}
--- End diff --

here, we choose to throw IllegalArgumentException to fail job. Besides, we 
can also choose to log a warning in a gentle way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16795
  
**[Test build #72479 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72479/testReport)**
 for PR 16795 at commit 
[`941bf00`](https://github.com/apache/spark/commit/941bf00e7a41c7e64c99c9499acfd5deafc04f39).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16376: [SPARK-18967][SCHEDULER] compute locality levels even if...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16376
  
**[Test build #72484 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72484/testReport)**
 for PR 16376 at commit 
[`cf8271e`](https://github.com/apache/spark/commit/cf8271e4c69efa573aabf891d5465736c7c79184).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16827: [SPARK-19482][CORE] Fail it if 'spark.master' is set wit...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16827
  
**[Test build #72483 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72483/testReport)**
 for PR 16827 at commit 
[`0cc5997`](https://github.com/apache/spark/commit/0cc599708d213c6aeaf8ad1a748323980444eb15).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16376: [SPARK-18967][SCHEDULER] compute locality levels even if...

2017-02-06 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/16376
  
yes, I think this is ready (I just noticed a couple of minor nits with a 
fresh read but no real changes)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16656: [SPARK-18116][DStream] Report stream input information a...

2017-02-06 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16656
  
cc @tdas also.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16827: [SPARK-19482][CORE] Fail it if 'spark.master' is ...

2017-02-06 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16827#discussion_r99738940
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -779,6 +781,30 @@ private[spark] object SparkConf extends Logging {
   }
 
   /**
+   * check if the given config key-value is valid.
+   */
+  private def checkValidSetting(key: String, value: String, conf: 
SparkConf): Unit = {
+key match {
+  case "spark.master" =>
+// First, there is no need to set 'spark.master' multi-times with 
different values.
+// Second, It is possible for users to set the different 
'spark.master' in code with
+// `spark-submit` command, and will confuse users.
+// So, we should do once check if the 'spark.master' already 
exists in settings and if
+// the previous value is the same with current value. Throw a 
IllegalArgumentException when
+// previous value is different with current value.
+val previousOne = try {
+  Some(conf.get(key))
+} catch {
+  case e: NoSuchElementException =>
+None
+}
+if (previousOne.isDefined && !previousOne.get.equals(value)) {
+  throw new IllegalArgumentException("'spark.master' should not be 
")
+}
--- End diff --

here, we choose to throw IllegalArgumentException to fail job. Besides, we 
can also choose to log a warning in a gentle way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16827: [SPARK-19482][CORE] Fail it if 'spark.master' is ...

2017-02-06 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/16827

[SPARK-19482][CORE] Fail it if 'spark.master' is set with different value

## What changes were proposed in this pull request?

First, there is no need to set 'spark.master' multi-times with different 
values. Second, It is possible for users to set the different 'spark.master' in 
code with
`spark-submit` command, and will confuse users. So, we should do once check 
if the 'spark.master' already exists in settings and if the previous value is 
the same with current value. Throw a IllegalArgumentException when previous 
value is different with current value.

## How was this patch tested?

add new unit test


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark SPARK-19482

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16827.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16827


commit 0cc599708d213c6aeaf8ad1a748323980444eb15
Author: uncleGen 
Date:   2017-02-07T03:26:56Z

Fail it if 'spark.master' is set with different value




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...

2017-02-06 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16797
  
If the use case where we want to infer the schema but not attempt to write 
it back as a property as suggested by @budde, is making sense, then the new SQL 
command approach might not work for it. But actually in this case, it is needed 
to ask someone who is permitted to migrate and update the table property.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16747: SPARK-16636 Add CalendarIntervalType to documentation

2017-02-06 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16747
  
Actually `CalendarInterval` is already exposed to users, e.g. we can call 
`collect` on a DataFrame with `CalendarIntervalType` field, and get rows 
containing `CalendarInterval`. We don't support reading/writing 
`CalendarIntervalType` though.

So I'm ok to add documents for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK] Classification and regression...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16171
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72482/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK] Classification and regression...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16171
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK] Classification and regression...

2017-02-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16171
  
**[Test build #72482 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72482/testReport)**
 for PR 16171 at commit 
[`bc85645`](https://github.com/apache/spark/commit/bc856454c7826cec4e3761352cd68e058e6a90ec).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

2017-02-06 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16762
  
is CROSS JOIN a logical or physical concept?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...

2017-02-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16795
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72473/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >