date:20161012

[GitHub] spark issue #15457: [SPARK-17830][SQL] Annotate remaining SQL APIs with Inte...

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15457
  
**[Test build #66869 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66869/consoleFull)**
 for PR 15457 at commit 
[`9f7db6f`](https://github.com/apache/spark/commit/9f7db6f0ea0831669e92ff2fe5231085e4e71895).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15416: [SPARK-17849] [SQL] Fix NPE problem when using grouping ...

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15416
  
**[Test build #3337 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3337/consoleFull)**
 for PR 15416 at commit 
[`69f6e4f`](https://github.com/apache/spark/commit/69f6e4f1bc37afd6b3ca529c8b0f0afec891459a).
 * This patch **fails Scala style tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log

2016-10-12 Thread loneknightpy

Github user loneknightpy commented on the issue:

https://github.com/apache/spark/pull/15285
  
@tdas Based on our offline discussion, I added file size cache for the 
compressed log files. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15416: [SPARK-17849] [SQL] Fix NPE problem when using grouping ...

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15416
  
**[Test build #3337 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3337/consoleFull)**
 for PR 15416 at commit 
[`69f6e4f`](https://github.com/apache/spark/commit/69f6e4f1bc37afd6b3ca529c8b0f0afec891459a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15452: minor doc fix for Row.scala

2016-10-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15452


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15414: [SPARK-17848][ML] Move LabelCol datatype cast into Predi...

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15414
  
**[Test build #66872 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66872/consoleFull)**
 for PR 15414 at commit 
[`7e2d501`](https://github.com/apache/spark/commit/7e2d501c951d6a3f7250156619979d29c080dc4b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15452: minor doc fix for Row.scala

2016-10-12 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15452
  
Merging in master/branch-2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log

2016-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15285
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66865/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log

2016-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15285
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15285
  
**[Test build #66865 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66865/consoleFull)**
 for PR 15285 at commit 
[`bd47bd4`](https://github.com/apache/spark/commit/bd47bd46962f6e7ee0bdf1bdfa5e777a506dd506).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...

2016-10-12 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15432
  
Given different databases diverge (they don't even have the same function 
names), I think it's fine to just have null be treated as 0 like Hive/MySQL.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15456: [SPARK-17686][Core] Support printing out scala an...

2016-10-12 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/15456#discussion_r83146972
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -104,6 +104,8 @@ object SparkSubmit {
/___/ .__/\_,_/_/ /_/\_\   version %s
   /_/
 """.format(SPARK_VERSION))
+printStream.println("Using Scala %s (%s, Java %s)".format(
--- End diff --

Thanks Reynold for your comments. I will change it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...

2016-10-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15432
  
Oh, strictly, it does not ignore in case of `PostgreSQL`. It unsets. 

```sql
postgres=# SELECT setseed(0);
 setseed
-

(1 row)

postgres=# SELECT random();
  random
---
 0.840187716763467
(1 row)

postgres=# SELECT random();
  random
---
 0.394382926635444
(1 row)

postgres=# SELECT setseed(null);
 setseed
-

(1 row)

postgres=# SELECT random();
  random
---
 0.783099223393947
(1 row)

postgres=# SELECT random();
  random
---
 0.798440033104271
(1 row)

postgres=# SELECT setseed(0);
 setseed
-

(1 row)

postgres=# SELECT random();
  random
---
 0.840187716763467
(1 row)

postgres=# SELECT random();
  random
---
 0.394382926635444
(1 row)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15230: [SPARK-17657] [SQL] Disallow Users to Change Table Type

2016-10-12 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15230
  
LGTM except one minor comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-12 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15148#discussion_r83146607
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,343 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import scala.util.Random
+
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.{Estimator, Model}
+import org.apache.spark.ml.linalg.{Vector, VectorUDT}
+import org.apache.spark.ml.param.{IntParam, ParamMap, ParamValidators}
+import org.apache.spark.ml.param.shared.{HasInputCol, HasOutputCol}
+import org.apache.spark.ml.util.SchemaUtils
+import org.apache.spark.sql._
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
+
+/**
+ * :: Experimental ::
+ * Params for [[LSH]].
+ */
+@Since("2.1.0")
+private[ml] trait LSHParams extends HasInputCol with HasOutputCol {
+  /**
+   * Param for the dimension of LSH OR-amplification.
+   *
+   * In this implementation, we use LSH OR-amplification to reduce the 
false negative rate. The
+   * higher the dimension is, the lower the false negative rate.
+   * @group param
+   */
+  @Since("2.1.0")
+  final val outputDim: IntParam = new IntParam(this, "outputDim", "output 
dimension, where" +
+"increasing dimensionality lowers the false negative rate, and 
decreasing dimensionality" +
+" improves the running performance", ParamValidators.gt(0))
+
+  /** @group getParam */
+  @Since("2.1.0")
+  final def getOutputDim: Int = $(outputDim)
+
+  /**
+   * Transform the Schema for LSH
+   * @param schema The schema of the input dataset without [[outputCol]]
+   * @return A derived schema with [[outputCol]] added
+   */
+  @Since("2.1.0")
+  protected[this] final def validateAndTransformSchema(schema: 
StructType): StructType = {
+SchemaUtils.appendColumn(schema, $(outputCol), new VectorUDT)
+  }
+}
+
+/**
+ * :: Experimental ::
+ * Model produced by [[LSH]].
+ */
+@Experimental
+@Since("2.1.0")
+private[ml] abstract class LSHModel[T <: LSHModel[T]] extends Model[T] 
with LSHParams {
+  self: T =>
+
+  @Since("2.1.0")
+  override def copy(extra: ParamMap): T = defaultCopy(extra)
+
+  /**
+   * The hash function of LSH, mapping a predefined KeyType to a Vector
+   * @return The mapping of LSH function.
+   */
+  @Since("2.1.0")
+  protected[this] val hashFunction: Vector => Vector
+
+  /**
+   * Calculate the distance between two different keys using the distance 
metric corresponding
+   * to the hashFunction
+   * @param x One input vector in the metric space
+   * @param y One input vector in the metric space
+   * @return The distance between x and y
+   */
+  @Since("2.1.0")
+  protected[ml] def keyDistance(x: Vector, y: Vector): Double
+
+  /**
+   * Calculate the distance between two different hash Vectors.
+   *
+   * @param x One of the hash vector
+   * @param y Another hash vector
+   * @return The distance between hash vectors x and y
+   */
+  @Since("2.1.0")
+  protected[ml] def hashDistance(x: Vector, y: Vector): Double
+
+  @Since("2.1.0")
+  override def transform(dataset: Dataset[_]): DataFrame = {
+transformSchema(dataset.schema, logging = true)
+val transformUDF = udf(hashFunction, new VectorUDT)
+dataset.withColumn($(outputCol), transformUDF(dataset($(inputCol
+  }
+
+  @Since("2.1.0")
+  override def transformSchema(schema: StructType): StructType = {
+validateAndTransformSchema(schema)
+  }
+
+  /**
+   * Given a large dataset and an item, approximately find at most k items 
which have the closest
+   * distance to the item. If the [[outputCol]] is missing, the method 
will transform the data; if
+   * the [[outputCol]] exists, it

[GitHub] spark issue #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicat...

2016-10-12 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15427
  
Thanks for review! @rxin @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropd...

2016-10-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15427


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicat...

2016-10-12 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15427
  
LGTM, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15272: [SPARK-17698] [SQL] Join predicates should not contain f...

2016-10-12 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15272
  
hm looks like another legitimate failing test too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...

2016-10-12 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14702
  
@tejasapatil looks like there is a legitimate failing test.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15458: [SPARK-17899][SQL] add a debug mode to keep raw table pr...

2016-10-12 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15458
  
LGTM pending Jenkins.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15458: [SPARK-17899][SQL] add a debug mode to keep raw table pr...

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15458
  
**[Test build #66871 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66871/consoleFull)**
 for PR 15458 at commit 
[`e821f1a`](https://github.com/apache/spark/commit/e821f1a9d19215fe180ffcbd8183aabd1185a316).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15458: [SPARK-17899][SQL] add a debug mode to keep raw t...

2016-10-12 Thread cloud-fan

GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/15458

[SPARK-17899][SQL] add a debug mode to keep raw table properties in 
HiveExternalCatalog

## What changes were proposed in this pull request?

Currently `HiveExternalCatalog` will filter out the Spark SQL internal 
table properties, e.g. `spark.sql.sources.provider`, 
`spark.sql.sources.schema`, etc. This is reasonable for external users as they 
don't want to see these internal properties in `DESC TABLE`.

However, as a Spark developer, sometimes we do wanna see the raw table 
properties. This PR adds a new internal SQL conf, `spark.sql.debug`, to enable 
debug mode and keep these raw table properties.

This config can also be used in similar places where we wanna retain debug 
information in the future.

## How was this patch tested?

new test in MetastoreDataSourcesSuite



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark debug

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15458.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15458


commit e821f1a9d19215fe180ffcbd8183aabd1185a316
Author: Wenchen Fan 
Date:   2016-10-13T04:21:01Z

add a debug mode to keep raw table properties in HiveExternalCatalog




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15458: [SPARK-17899][SQL] add a debug mode to keep raw table pr...

2016-10-12 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15458
  
cc @yhuai


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...

2016-10-12 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15432
  
Now, at least we have four options, when users setting `NULL` as a seed 
number for `rand`:
1. Hive/MySQL - `NULL` is equivalent to `0`
2. DB2 - when the seed is `NULL`, `rand` returns `NULL`
3. PostgreSQL - when the seed is `NULL`, ignore it. 
4. SparkSQL - does not allow it.

I do not have a strong opinion. Maybe @rxin need to make a decision. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15437: [SPARK-17876] Write StructuredStreaming WAL to a ...

2016-10-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15437


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...

2016-10-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15432
  
Ah, PostgreSQL seems unsetting the seed.

```
postgres=# SELECT setseed(0), random(), random();
 setseed |  random   |  random
-+---+---
 | 0.840187716763467 | 0.394382926635444
(1 row)

postgres=# SELECT setseed(0), random(), random();
 setseed |  random   |  random
-+---+---
 | 0.840187716763467 | 0.394382926635444
(1 row)

postgres=# SELECT setseed(null), random(), random();
 setseed |  random   |  random
-+---+---
 | 0.783099223393947 | 0.798440033104271
(1 row)

postgres=# SELECT setseed(null), random(), random();
 setseed |  random   |  random
-+---+---
 | 0.911647357512265 | 0.197551369201392
(1 row)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15437: [SPARK-17876] Write StructuredStreaming WAL to a stream ...

2016-10-12 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/15437
  
LGTM. Thanks! Merging to master and 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15456: [SPARK-17686][Core] Support printing out scala and java ...

2016-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15456
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66862/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15456: [SPARK-17686][Core] Support printing out scala and java ...

2016-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15456
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15456: [SPARK-17686][Core] Support printing out scala and java ...

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15456
  
**[Test build #66862 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66862/consoleFull)**
 for PR 15456 at commit 
[`98e7015`](https://github.com/apache/spark/commit/98e70150f26ee6d1fd0e587b59ba7467d70dcfe3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15365: [SPARK-17157][SPARKR]: Add multiclass logistic re...

2016-10-12 Thread wangmiao1981

Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15365#discussion_r83143464
  
--- Diff: R/pkg/R/mllib.R ---
@@ -647,6 +654,195 @@ setMethod("predict", signature(object = 
"KMeansModel"),
 predict_internal(object, newData)
   })
 
+#' Logistic Regression Model
+#'
+#' Fits an logistic regression model against a Spark DataFrame. It 
supports "binomial": Binary logistic regression
+#' with pivoting; "multinomial": Multinomial logistic (softmax) regression 
without pivoting, similar to glmnet.
+#' Users can print, make predictions on the produced model and save the 
model to the input path.
+#'
+#' @param data SparkDataFrame for training
+#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#'operators are supported, including '~', '.', ':', '+', 
and '-'.
+#' @param regParam the regularization parameter. Default is 0.0.
+#' @param elasticNetParam the ElasticNet mixing parameter. For alpha = 0, 
the penalty is an L2 penalty.
+#'For alpha = 1, it is an L1 penalty. For 0 < 
alpha < 1, the penalty is a combination
+#'of L1 and L2. Default is 0.0 which is an L2 
penalty.
+#' @param maxIter maximum iteration number.
+#' @param tol convergence tolerance of iterations.
+#' @param fitIntercept whether to fit an intercept term. Default is TRUE.
+#' @param family the name of family which is a description of the label 
distribution to be used in the model.
+#'   Supported options:
+#' - "auto": Automatically select the family based on the 
number of classes:
+#'   If numClasses == 1 || numClasses == 2, set to 
"binomial".
+#'   Else, set to "multinomial".
+#' - "binomial": Binary logistic regression with pivoting.
+#' - "multinomial": Multinomial logistic (softmax) 
regression without pivoting.
+#' Default is "auto".
+#' @param standardization whether to standardize the training features 
before fitting the model. The coefficients
+#'of models will be always returned on the 
original scale, so it will be transparent for
+#'users. Note that with/without standardization, 
the models should be always converged
+#'to the same solution when no regularization is 
applied. Default is TRUE, same as glmnet.
+#' @param threshold in binary classification, in range [0, 1]. If the 
estimated probability of class label 1
+#'  is > threshold, then predict 1, else 0. A high 
threshold encourages the model to predict 0
+#'  more often; a low threshold encourages the model to 
predict 1 more often. Note: Setting this with
+#'  threshold p is equivalent to setting thresholds 
(Array(1-p, p)). When threshold is set, any user-set
+#'  value for thresholds will be cleared. If both 
threshold and thresholds are set, then they must be
+#'  equivalent. Default is 0.5.
+#' @param thresholds in multiclass (or binary) classification to adjust 
the probability of predicting each class.
+#'   Array must have length equal to the number of 
classes, with values > 0, excepting that at most one
+#'   value may be 0. The class with largest value p/t is 
predicted, where p is the original probability
+#'   of that class and t is the class's threshold. Note: 
When thresholds is set, any user-set
+#'   value for threshold will be cleared. If both 
threshold and thresholds are set, then they must be
+#'   equivalent. Default is NULL.
+#' @param weightCol The weight column name.
+#' @param aggregationDepth depth for treeAggregate (>= 2). If the 
dimensions of features or the number of partitions
+#' are large, this param could be adjusted to a 
larger size. Default is 2.
+#' @param ... additional arguments passed to the method.
+#' @return \code{spark.logit} returns a fitted logistic regression model
+#' @rdname spark.logit
+#' @aliases spark.logit,SparkDataFrame,formula-method
+#' @name spark.logit
+#' @export
+#' @examples
+#' \dontrun{
+#' sparkR.session()
+#' # binary logistic regression
+#' label <- c(1.0, 1.0, 1.0, 0.0, 0.0)
+#' feature <- c(1.1419053, 0.9194079, -0.9498666, -1.1069903, 0.2809776)
+#' binary_data <- as.data.frame(cbind(label, feature))
+#' binary_df <- suppressWarnings(createDataFrame(binary_data))
+#' blr_model <- spark.logit(binary_df, label ~ feature, threshold = 1.0)
+#' blr_predict <- collect(select(predict(blr_model,

[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...

2016-10-12 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15432
  
What is the behavior of `PostgreSQL`? Treating `NULL` as zero?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...

2016-10-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15432
  
Not urgent but in my experience such PR have been being hold. So, I am 
trying to fix the problem specified in the JIRA only rather than fixing others 
together.
 
@srowen said "I'm not even sure that's a bug.." but "... reasonable to try 
to follow it.". At least, all the implementations of DB2, MySQL and PostgreSQL 
do not throw an exception but it defines its own behaviour.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12933: [Spark-15155][Mesos] Optionally ignore default role reso...

2016-10-12 Thread tnachen

Github user tnachen commented on the issue:

https://github.com/apache/spark/pull/12933
  
I just tried running it locally and I'm getting the same error. It seems 
like with your change that test is simply declining the offer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15449: [SPARK-17884][SQL] To resolve Null pointer exception whe...

2016-10-12 Thread priyankagargnitk

Github user priyankagargnitk commented on the issue:

https://github.com/apache/spark/pull/15449
  
Thanks rxin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...

2016-10-12 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15432
  
At first, we do not strictly follow Hive. You can easily find many in 
Spark. I do not think this is an urgent JIRA, right? Like what @srowen replied 
in the JIRA, he does not think this is a bug. The existing output message looks 
reasonable to me too.  
```
Input argument to rand must be an integer literal.;; line 1 pos 0
```
Setting the seed as `null` also looks weird to me. 

DB2 and Oracle have free versions to download. You can easily install the 
docker versions. You also can google their documentation. What we need to do at 
first is to do an investigation to save the times of all the other reviewers; 
otherwise, they have to do it too. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...

2016-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15408
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...

2016-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15408
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66860/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15408
  
**[Test build #66860 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66860/consoleFull)**
 for PR 15408 at commit 
[`b74fb36`](https://github.com/apache/spark/commit/b74fb36de321fd03b48f0a6b9b772589df3d84b9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...

2016-10-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15432
  
Strictly, the JIRA describes handling `null` and we might not have to 
generalize the cases further.

> it will failed when do select rand(null)

Also, I would like to add the edge cases here but I'd like to avoid PR is 
being hold. 

As not all the things have a standard to follow, we can define the 
behaviour here. I don't have access to Oracle and DB2. Do you think Hive, 
PostgreSQL and MySQL examples are not enough?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9766
  
**[Test build #66870 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66870/consoleFull)**
 for PR 9766 at commit 
[`8171b85`](https://github.com/apache/spark/commit/8171b8515107ea66fa277c52823167d206b4756a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...

2016-10-12 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15432
  
Unfortunately, not all the things have a standard to follow. That is why I 
suggested you to do a research about it. Like Oracle, it does not have such a 
function in their SQL-function list: 
https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions001.htm

Since you are doing the change in `rand`, I think you can check whether the 
existing `rand` behaves as expected and adds the missing test cases if needed. 
This JIRA is just trying to cover an edge case of a seed number. Why not 
checking whether we appropriately handle all the cases? Then, we do not need to 
submit more small fixes for `rand`, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-10-12 Thread mallman

Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/14690#discussion_r83141382
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -616,6 +617,44 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
 client.getPartition(db, table, spec)
   }
 
+  override def listPartitionsByFilter(
+  db: String,
+  table: String,
+  predicates: Seq[Expression]): Seq[CatalogTablePartition] = 
withClient {
+val catalogTable = client.getTable(db, table)
+val partitionColumnNames = catalogTable.partitionColumnNames.toSet
+val nonPartitionPruningPredicates = predicates.filterNot {
+  _.references.map(_.name).toSet.subsetOf(partitionColumnNames)
+}
+
+if (nonPartitionPruningPredicates.nonEmpty) {
+sys.error("Expected only partition pruning predicates: " +
+  predicates.reduceLeft(And))
+}
+
+val partitionSchema = catalogTable.partitionSchema
+
+if (predicates.nonEmpty) {
+  val clientPrunedPartitions =
+client.getPartitionsByFilter(catalogTable, predicates)
+  val boundPredicate =
+InterpretedPredicate.create(predicates.reduce(And).transform {
+  case att: AttributeReference =>
+val index = partitionSchema.indexWhere(_.name == att.name)
--- End diff --

I tested this with unit tests from two test suites on two branches. The 
first test suite was `SQLQuerySuite` from the Hive codebase, specifically the 
test "SPARK-10562: partition by column with mixed case name". The second test 
suite was (a modified) `ParquetMetastoreSuite`. I modified the name of the 
partition column in the partitioned tables in the latter suite from `p` to 
`pQ`. The two branches on which I tested were this PR and commit 8d33e1e from 
the master branch.

The first test suite passed on both branches. I guess that's to be expected 
since our Jenkins bot has been reporting it as passed.

The second suite failed (as modified) on both branches. In both branches, 
Spark SQL failed to find the partitions on-disk. This makes me wonder:

1. Is this a known/accepted limitation?
1. If unknown, is this an acceptable limitation or a bug to be fixed?

The best I found regarding support for mixed-case partition columns was in 
https://issues.apache.org/jira/browse/SPARK-10562. Unlike in the first test 
(which uses the `saveAsTable` method), the tables in `ParquetMetastoreSuite` 
are built with SQL DDL.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15457: [SPARK-17830][SQL] Annotate remaining SQL APIs with Inte...

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15457
  
**[Test build #66869 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66869/consoleFull)**
 for PR 15457 at commit 
[`9f7db6f`](https://github.com/apache/spark/commit/9f7db6f0ea0831669e92ff2fe5231085e4e71895).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9766
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66868/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9766
  
**[Test build #66868 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66868/consoleFull)**
 for PR 9766 at commit 
[`00f65cd`](https://github.com/apache/spark/commit/00f65cde80b15c174183a52707643642a2bcf7b8).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9766
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2016-10-12 Thread sethah

Github user sethah commented on the issue:

https://github.com/apache/spark/pull/15435
  
So, based on my interpretation of this and how this can actually work, we 
need to have:

scala
sealed trait LogisticRegressionSummary
sealed trait LogisticRegressionTrainingSummary
class MulticlassLogisticRegressionSummary extends LogisticRegressionSummary
class MulticlassLogisticRegressionTrainingSummary extends 
MulticlassLogisticRegressionSummary with LogisticRegressionTrainingSummary
class BinaryLogisticRegressionSummary extends 
MulticlassLogisticRegressionSummary
class BinaryLogisticRegressionTrainingSummary extends 
BinaryLogisticRegressionSummary with LogisticRegressionTrainingSummary


Then, in `LogisticRegressionModel` we have:

scala
def summary: LogisticRegressionTrainingSummary
def binarySummary: BinaryLogisticRegressionTrainingSummary = summary match {
  case b: BinaryLogisticRegressionTrainingSummary => b
  case _ => throw new Exception()
}


And we avoid downcasting in the summary case since 
`MulticlassLogisticRegressionSummary` only implements the methods defined in 
the trait. Otherwise, we would have to downcast to get access to those methods. 
Then if the summary is binary, you can just call binary summary. Anyway, I got 
this to compile, and if there is some other way, I'm not seeing it. Would 
really like to get some clarification from @jkbradley. Not sure if 
@feynmanliang is still involved with Spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9766
  
**[Test build #66868 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66868/consoleFull)**
 for PR 9766 at commit 
[`00f65cd`](https://github.com/apache/spark/commit/00f65cde80b15c174183a52707643642a2bcf7b8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15457: [SPARK-17830][SQL] Annotate remaining SQL APIs wi...

2016-10-12 Thread rxin

GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/15457

[SPARK-17830][SQL] Annotate remaining SQL APIs with InterfaceStability

## What changes were proposed in this pull request?
This patch annotates all the remaining APIs in SQL (excluding streaming) 
with InterfaceStability.

## How was this patch tested?
N/A - just annotation change.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-17830-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15457.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15457


commit 5f51cbb02d90f16477802601fa93b18664a57dfa
Author: Reynold Xin 
Date:   2016-10-13T03:45:24Z

[SPARK-17830][SQL] Annotate remaining SQL APIs with InterfaceStability




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15457: [SPARK-17830][SQL] Annotate remaining SQL APIs with Inte...

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15457
  
**[Test build #66867 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66867/consoleFull)**
 for PR 15457 at commit 
[`5f51cbb`](https://github.com/apache/spark/commit/5f51cbb02d90f16477802601fa93b18664a57dfa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15457: [SPARK-17830][SQL] Annotate remaining SQL APIs wi...

2016-10-12 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15457#discussion_r83140489
  
--- Diff: sql/core/src/main/java/org/apache/spark/sql/api/java/UDF1.java ---
@@ -19,14 +19,12 @@
 
 import java.io.Serializable;
 
-// **
--- End diff --

I can't find FunctionRegistration anymore, so deleting this.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15457: [SPARK-17830][SQL] Annotate remaining SQL APIs wi...

2016-10-12 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15457#discussion_r83140501
  
--- Diff: sql/core/src/main/java/org/apache/spark/sql/api/java/UDF1.java ---
@@ -19,14 +19,12 @@
 
 import java.io.Serializable;
 
-// **
-// THIS FILE IS AUTOGENERATED BY CODE IN
-// org.apache.spark.sql.api.java.FunctionRegistration
-// **
+import org.apache.spark.annotation.InterfaceStability;
 
 /**
  * A Spark SQL UDF that has 1 arguments.
  */
+@InterfaceStability.Stable
 public interface UDF1 extends Serializable {
-  public R call(T1 t1) throws Exception;
+  R call(T1 t1) throws Exception;
--- End diff --

methods in interface in Java are by default public.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15455: [SPARK-16827] [Branch-2.0] Avoid reporting spill ...

2016-10-12 Thread dafrista

Github user dafrista closed the pull request at:

https://github.com/apache/spark/pull/15455


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9766
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2016-10-12 Thread sethah

Github user sethah commented on the issue:

https://github.com/apache/spark/pull/15435
  
So I've been reading through some of the history with logistic regression 
summaries. There was a lot of discussion on how to design the abstractions for 
this, [here](https://github.com/apache/spark/pull/7538) and 
[here](https://github.com/apache/spark/pull/8197).

I'm reposting some of the relevant snippets (I will comment on them in a 
follow up):

"We'll need to use traits to fix the multiple inheritance issue:"


sealed trait LogisticRegressionSummary
sealed trait LogisticRegressionTrainingSummary
class BinaryLogisticRegressionSummary extends LogisticRegressionSummary
class BinaryLogisticRegressionTrainingSummary extends 
BinaryLogisticRegressionSummary with LogisticRegressionTrainingSummary


"Are we planning to have a MulticlassLogisticRegressionSummary inheriting 
from LogisticRegressionSummary in the future because without that I'm unable to 
understand how using a trait would help since there is no access to the 
predictions dataframe."

"Yes, MulticlassLogisticRegressionSummary should be analogous to the binary 
version, with both inheriting from LogisticRegressionSummary."

...

"Synced with @jkbradley offline. Summary:

We should not require end users to perform any sort of downcasting in the 
stabilized API. This is OK for now since the API is still experimental.

Eventually we could provide two methods, a summary : 
LogisticRegressionSummary and a binarySummary : BInaryLogisticRegressionSummary 
which errors when called on a multiclass LRModel. This will be easy to 
implement because summary is returning the base LogisticRegressionSummary class 
so will not require any public API change."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9766
  
**[Test build #66866 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66866/consoleFull)**
 for PR 9766 at commit 
[`18fa6e3`](https://github.com/apache/spark/commit/18fa6e3bb00c5a81a3d44364b3644e35263bedbd).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9766
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66866/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9766
  
**[Test build #66866 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66866/consoleFull)**
 for PR 9766 at commit 
[`18fa6e3`](https://github.com/apache/spark/commit/18fa6e3bb00c5a81a3d44364b3644e35263bedbd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15455: [SPARK-16827] [Branch-2.0] Avoid reporting spill metrics...

2016-10-12 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15455
  
Merging in. Thanks.

Can you also close the pr? GitHub wont' close it automatically because it 
is not merged into master branch.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropd...

2016-10-12 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15427#discussion_r83140093
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1878,17 +1878,25 @@ class Dataset[T] private[sql](
   def dropDuplicates(colNames: Seq[String]): Dataset[T] = withTypedPlan {
 val resolver = sparkSession.sessionState.analyzer.resolver
 val allColumns = queryExecution.analyzed.output
-val groupCols = colNames.map { colName =>
-  allColumns.find(col => resolver(col.name, colName)).getOrElse(
+val groupCols = colNames.flatMap { colName =>
+  // It is possibly there are more than one columns with the same name,
+  // so we call filter instead of find.
+  val cols = allColumns.filter(col => resolver(col.name, colName))
+  if (cols.isEmpty) {
 throw new AnalysisException(
--- End diff --

My thought is:

When an user mistakenly gives wrong column to `Dataset.drop`, it can be 
easily found out.

But for `Dataset.dropDuplicates`, it might be harder to figure out 
duplicate rows are still there. So to throw an explicit exception looks more 
proper to me. 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14690
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66861/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14690
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14690
  
**[Test build #66861 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66861/consoleFull)**
 for PR 14690 at commit 
[`59fecdf`](https://github.com/apache/spark/commit/59fecdf1e889c218ac81cdf73ba3e46142d052e6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...

2016-10-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15432
  
Initially, this JIRA was only handling `null` as seed. If you both worry 
the change here, I would like to make the PR smaller as suggested initially.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...

2016-10-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15432
  
That is a great reference. However, is this the function described in a 
standard? I guess it is different for each implementation of database. For 
example, 

> The result can be null; if the argument is null, the result is the null 
value.

MySQL treats it as 0 rather than returning `null` value. Also, I gave both 
references of MySQL and Hive in the PR description. Can we define the behaviour 
here? Do we have a target DBMS to follow? I guess it is usually Hive, 
PostgreSQL and MySQL as I recall.

In case of PostgreSQL, it seems there is both functions for this, 
`random()` and `setseed()`. This works differently with MySQL also DB2 
(assuming from the comment you left). So, I got rid of this.

I think I have checked other examples enough. Do we usually have such 
explanations and tests of all the DBMS, Oracle, MySQL, SQL Server, Hive, DB2, 
Informix and PostgresSQL and mentions in ANSI standard? It can be problematic 
if we don't comply the standard which all other implementations follow but I 
think it'd be fine if other databases have different implementations.

I am sure I am taking every look for other PRs time to time and trying to 
make mine sensible but I don't think we always have references from all other 
DBMS and explanations from ANSI standard.

It is hard to change it again and that is why I am asking to review. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15285
  
**[Test build #66865 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66865/consoleFull)**
 for PR 15285 at commit 
[`bd47bd4`](https://github.com/apache/spark/commit/bd47bd46962f6e7ee0bdf1bdfa5e777a506dd506).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15414: [SPARK-17848][ML] Move LabelCol datatype cast into Predi...

2016-10-12 Thread sethah

Github user sethah commented on the issue:

https://github.com/apache/spark/pull/15414
  
Thanks, I'll take a more detailed look in the next couple of days. Let's 
also wait and see if we can get @yanboliang or @jkbradley to give an opinion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15230: [SPARK-17657] [SQL] Disallow Users to Change Tabl...

2016-10-12 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15230#discussion_r83138864
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -111,6 +111,10 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
 s"as table property keys may not start with '$DATASOURCE_PREFIX' 
or '$STATISTICS_PREFIX':" +
 s" ${invalidKeys.mkString("[", ", ", "]")}")
 }
+// External users are not allowed to set/switch the table type.
+if (table.properties.contains("EXTERNAL")) {
--- End diff --

I tried Hive. Hive only accepts `EXTERNAL` if users want to change the 
table type. That means, if users do it like `external` or `ExterRnal`, Hive 
just treats it as a regular property key. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15414: [SPARK-17848][ML] Move LabelCol datatype cast int...

2016-10-12 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/15414#discussion_r83057213
  
--- Diff: mllib/src/test/scala/org/apache/spark/ml/PredictorSuite.scala ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.ml.linalg._
+import org.apache.spark.ml.param.ParamMap
+import org.apache.spark.ml.util._
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.apache.spark.sql.Dataset
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
+
+class PredictorSuite extends SparkFunSuite with MLlibTestSparkContext with 
DefaultReadWriteTest {
+
+  import PredictorSuite._
+
+  test("should support all NumericType labels and not support other 
types") {
+val df = spark.createDataFrame(Seq(
+  (0, Vectors.dense(0, 2, 3)),
+  (1, Vectors.dense(0, 3, 9)),
+  (0, Vectors.dense(0, 2, 6))
+)).toDF("label", "features")
+
+val types =
+  Seq(ShortType, LongType, IntegerType, FloatType, ByteType, 
DoubleType, DecimalType(10, 0))
+
+val predictor = new MockPredictor()
+
+types.foreach { t =>
+  predictor.fit(df.select(col("label").cast(t), col("features")))
+}
+
+intercept[IllegalArgumentException] {
+  predictor.fit(df.select(col("label").cast(StringType), 
col("features")))
+}
+  }
+}
+
+object PredictorSuite {
+
+  class MockPredictor(override val uid: String)
+extends Predictor[Vector, MockPredictor, MockPredictionModel] {
+
+def this() = this(Identifiable.randomUID("mockpredictor"))
+
+override def train(dataset: Dataset[_]): MockPredictionModel = {
+  require(dataset.schema("label").dataType == DoubleType)
+  new MockPredictionModel(uid)
+}
+
+override def copy(extra: ParamMap): MockPredictor = defaultCopy(extra)
--- End diff --

change the copy methods to throw NotImplementedError


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15230: [SPARK-17657] [SQL] Disallow Users to Change Tabl...

2016-10-12 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15230#discussion_r83138671
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -111,6 +111,10 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
 s"as table property keys may not start with '$DATASOURCE_PREFIX' 
or '$STATISTICS_PREFIX':" +
 s" ${invalidKeys.mkString("[", ", ", "]")}")
 }
+// External users are not allowed to set/switch the table type.
+if (table.properties.contains("EXTERNAL")) {
--- End diff --

should we be case-insensitive here? e.g. `external`, `ExteRNal`, etc. are 
all not allowed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log

2016-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15285
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15285
  
**[Test build #66864 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66864/consoleFull)**
 for PR 15285 at commit 
[`60cc130`](https://github.com/apache/spark/commit/60cc130790d8b9f5531bd7290b5c40e419e3016f).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log

2016-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15285
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66864/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66859/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15307
  
**[Test build #66859 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66859/consoleFull)**
 for PR 15307 at commit 
[`cafbeb7`](https://github.com/apache/spark/commit/cafbeb72f064295a6d9b07c31515e59f14c17305).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class AssertOnLastQueryStatus(condition: StreamingQueryStatus 
=> Unit)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15285
  
**[Test build #66864 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66864/consoleFull)**
 for PR 15285 at commit 
[`60cc130`](https://github.com/apache/spark/commit/60cc130790d8b9f5531bd7290b5c40e419e3016f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log

2016-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15285
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log

2016-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15285
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66863/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15285
  
**[Test build #66863 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66863/consoleFull)**
 for PR 15285 at commit 
[`89b9acd`](https://github.com/apache/spark/commit/89b9acd8642640f987837caa3df23b68a043b43f).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15414: [SPARK-17848][ML] Move LabelCol datatype cast into Predi...

2016-10-12 Thread zhengruifeng

Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/15414
  
@sethah I have maken some modification according to the comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15285
  
**[Test build #66863 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66863/consoleFull)**
 for PR 15285 at commit 
[`89b9acd`](https://github.com/apache/spark/commit/89b9acd8642640f987837caa3df23b68a043b43f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...

2016-10-12 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15432
  
Let me show you an example:

https://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/sqlref/src/tpc/db2z_bif_rand.html

This is the official document of `rand` in DB2 z/OS. Below is about the 
input parameter:

1. If numeric-expression is specified, it is used as the seed value. The 
argument must be an expression that returns a value of a built-in integer data 
type (SMALLINT or INTEGER). The value must be between 0 and 2,147,483,646.

2. The argument must be an expression that returns a value of a built-in 
integer data type (SMALLINT or INTEGER). The value must be between 0 and 
2,147,483,646.

3. The result can be null; if the argument is null, the result is the null 
value.

4. RAND(0) is processed the same as RAND().



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15402: [SPARK-17835][ML][MLlib] Optimize NaiveBayes mlli...

2016-10-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15402


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15402: [SPARK-17835][ML][MLlib] Optimize NaiveBayes mllib wrapp...

2016-10-12 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/15402
  
Merged into master. Thanks for review. @zhengruifeng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15406: [Spark-17745][ml][PySpark] update NB python api -...

2016-10-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15406


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15406: [Spark-17745][ml][PySpark] update NB python api - add we...

2016-10-12 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/15406
  
LGTM2, merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-12 Thread zjffdu

Github user zjffdu commented on a diff in the pull request:

https://github.com/apache/spark/pull/9766#discussion_r83134970
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
@@ -414,6 +418,84 @@ class UDFRegistration private[sql] (functionRegistry: 
FunctionRegistry) extends
   
//
 
   /**
--- End diff --

I can turn it on, but it would make the function less readable, especially 
for the following statements where it beyond line length limitation. 

```
case 14 => register(name, udf.asInstanceOf[UDF13[_, _, _, _, _, _, _, _, _, 
_, _, _, _, _]], returnType)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15408
  
**[Test build #66860 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66860/consoleFull)**
 for PR 15408 at commit 
[`b74fb36`](https://github.com/apache/spark/commit/b74fb36de321fd03b48f0a6b9b772589df3d84b9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15456: [SPARK-17686][Core] Support printing out scala and java ...

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15456
  
**[Test build #66862 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66862/consoleFull)**
 for PR 15456 at commit 
[`98e7015`](https://github.com/apache/spark/commit/98e70150f26ee6d1fd0e587b59ba7467d70dcfe3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66858/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15253: [SPARK-17678][REPL][Branch-1.6] Honor spark.replClassSer...

2016-10-12 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/15253
  
@zsxwing , would you mind taking a look at this fix for 1.6 branch, thanks 
a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15307
  
**[Test build #66858 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66858/consoleFull)**
 for PR 15307 at commit 
[`00a7415`](https://github.com/apache/spark/commit/00a741519e07fdda6dc2e4161e0f0d4382ef7c0a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15456: [SPARK-17686][Core] Support printing out scala an...

2016-10-12 Thread jerryshao

GitHub user jerryshao opened a pull request:

https://github.com/apache/spark/pull/15456

[SPARK-17686][Core] Support printing out scala and java version with 
spark-submit --version command

## What changes were proposed in this pull request?

In our universal gateway service we need to specify different jars to Spark 
according to scala version. For now only after launching Spark application can 
we know which version of Scala it depends on. It makes hard for us to support 
different Scala + Spark versions to pick the right jars.

So here propose to print out Scala version according to Spark version in 
"spark-submit --version", so that user could leverage this output to make the 
choice without needing to launching application.

## How was this patch tested?

Manually verified in local environment.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jerryshao/apache-spark SPARK-17686

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15456.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15456


commit 98e70150f26ee6d1fd0e587b59ba7467d70dcfe3
Author: jerryshao 
Date:   2016-10-13T02:07:46Z

print out scala and java version with --version command




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14690
  
**[Test build #66861 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66861/consoleFull)**
 for PR 14690 at commit 
[`59fecdf`](https://github.com/apache/spark/commit/59fecdf1e889c218ac81cdf73ba3e46142d052e6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...

2016-10-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15432
  
(Oh, I am making a comment via my phone. Sorry for occasional  closing and 
reopening here..)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-10-12 Thread HyukjinKwon

GitHub user HyukjinKwon reopened a pull request:

https://github.com/apache/spark/pull/15432

[SPARK-17854][SQL] rand/randn allows null/long as input seed

## What changes were proposed in this pull request?

This PR proposes `rand`/`randn` accept `null` as input in Scala/SQL and 
`LongType` as input in SQL. In this case, it treats the values as `0`.

So, this PR includes both changes below:

- `null` support

  It seems MySQL also accepts this.


  ```sql
mysql> select rand(0);
+-+
| rand(0) |
+-+
| 0.15522042769493574 |
+-+
1 row in set (0.00 sec)

mysql> select rand(NULL);
+-+
| rand(NULL)  |
+-+
| 0.15522042769493574 |
+-+
1 row in set (0.00 sec)
```

  and also Hive does according to 
[HIVE-14694](https://issues.apache.org/jira/browse/HIVE-14694)

  So the codes below:

  ```scala
  spark.range(1).selectExpr("rand(null)").show()
  ```

  prints..

  **Before**

```
Input argument to rand must be an integer literal.;; line 1 pos 0
org.apache.spark.sql.AnalysisException: Input argument to rand must be an 
integer literal.;; line 1 pos 0
at 
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$5.apply(FunctionRegistry.scala:465)
at 
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$5.apply(FunctionRegistry.scala:444)
```

  **After**

```
+---+
|rand(CAST(NULL AS INT))|
+---+
|0.13385709732307427|
+---+
```

- `LongType` support in SQL.

  In addition, it make the function allows to take `LongType` consistently 
within Scala/SQL.

  In more details, the codes below:

  ```scala
spark.range(1).select(rand(1), rand(1L)).show()
spark.range(1).selectExpr("rand(1)", "rand(1L)").show()
```

  prints..

  **Before**

```
+--+--+
|   rand(1)|   rand(1)|
+--+--+
|0.2630967864682161|0.2630967864682161|
+--+--+


Input argument to rand must be an integer literal.;; line 1 pos 0
org.apache.spark.sql.AnalysisException: Input argument to rand must be an 
integer literal.;; line 1 pos 0
at 
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$5.apply(FunctionRegistry.scala:465)
at
```

  **After**

```
+--+--+
|   rand(1)|   rand(1)|
+--+--+
|0.2630967864682161|0.2630967864682161|
+--+--+

+--+--+
|   rand(1)|   rand(1)|
+--+--+
|0.2630967864682161|0.2630967864682161|
+--+--+
```


## How was this patch tested?

Unit tests in `DataFrameSuite.scala` and `RandomSuite.scala`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-17854

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15432.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15432


commit 7fa7db22dd4f2ba88ab1f09e4b776003b3f62fdb
Author: hyukjinkwon 
Date:   2016-10-11T09:21:18Z

rand/randn allows null as input seed

commit 6f8f3f33f9b67d77285048bfd7d794990e072b8a
Author: hyukjinkwon 
Date:   2016-10-12T12:23:56Z

Use ExpectsInputTypes and allow LongType and IntegerType

commit a99f674ff9b9cebb730a1e290c0fa05af8627f1d
Author: hyukjinkwon 
Date:   2016-10-12T14:31:11Z

Override constructor




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 690 matches

Mail list logo