date:20180810

[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22037
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22037
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94546/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type

2018-08-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22037
  
**[Test build #94546 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94546/testReport)**
 for PR 22037 at commit 
[`9eefbe5`](https://github.com/apache/spark/commit/9eefbe5dc58bba272dedce7ae0174be89a0a9b28).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...

2018-08-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22067
  
**[Test build #94556 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94556/testReport)**
 for PR 22067 at commit 
[`0a6bccc`](https://github.com/apache/spark/commit/0a6bccc9e6a308d0b064bc0f2f37f7b19294df20).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22070: Fix typos detected by github.com/client9/misspell

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22070
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22070: Fix typos detected by github.com/client9/misspell

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22070
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...

2018-08-10 Thread LantaoJin

Github user LantaoJin commented on the issue:

https://github.com/apache/spark/pull/22067
  
@jerryshao Could you help to trigger test build please?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...

2018-08-10 Thread LantaoJin

Github user LantaoJin commented on the issue:

https://github.com/apache/spark/pull/22067
  
ok to test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22070: Fix typos detected by github.com/client9/misspell

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22070
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22070: Fix typos detected by github.com/client9/misspell

2018-08-10 Thread seratch

GitHub user seratch opened a pull request:

https://github.com/apache/spark/pull/22070

Fix typos detected by github.com/client9/misspell

## What changes were proposed in this pull request?

Fixing typos is sometimes very hard. It's not so easy to visually review 
them. Recently, I discovered a very useful tool for it, 
[misspell](https://github.com/client9/misspell). 

This pull request fixes minor typos detected by 
[misspell](https://github.com/client9/misspell) except for the false positives. 
If you would like me to work on other files as well, let me know. 

## How was this patch tested?

### before

```
$ misspell . | grep -v '.js'
R/pkg/R/SQLContext.R:354:43: "definiton" is a misspelling of "definition"
R/pkg/R/SQLContext.R:424:43: "definiton" is a misspelling of "definition"
R/pkg/R/SQLContext.R:445:43: "definiton" is a misspelling of "definition"
R/pkg/R/SQLContext.R:495:43: "definiton" is a misspelling of "definition"
NOTICE-binary:454:16: "containd" is a misspelling of "contained"
R/pkg/R/context.R:46:43: "definiton" is a misspelling of "definition"
R/pkg/R/context.R:74:43: "definiton" is a misspelling of "definition"
R/pkg/R/DataFrame.R:591:48: "persistance" is a misspelling of "persistence"
R/pkg/R/streaming.R:166:44: "occured" is a misspelling of "occurred"
R/pkg/inst/worker/worker.R:65:22: "ouput" is a misspelling of "output"
R/pkg/tests/fulltests/test_utils.R:106:25: "environemnt" is a misspelling 
of "environment"

common/kvstore/src/test/java/org/apache/spark/util/kvstore/InMemoryStoreSuite.java:38:39:
 "existant" is a misspelling of "existent"

common/kvstore/src/test/java/org/apache/spark/util/kvstore/LevelDBSuite.java:83:39:
 "existant" is a misspelling of "existent"

common/network-common/src/main/java/org/apache/spark/network/crypto/TransportCipher.java:243:46:
 "transfered" is a misspelling of "transferred"

common/network-common/src/main/java/org/apache/spark/network/sasl/SaslEncryption.java:234:19:
 "transfered" is a misspelling of "transferred"

common/network-common/src/main/java/org/apache/spark/network/sasl/SaslEncryption.java:238:63:
 "transfered" is a misspelling of "transferred"

common/network-common/src/main/java/org/apache/spark/network/sasl/SaslEncryption.java:244:46:
 "transfered" is a misspelling of "transferred"

common/network-common/src/main/java/org/apache/spark/network/sasl/SaslEncryption.java:276:39:
 "transfered" is a misspelling of "transferred"

common/network-common/src/main/java/org/apache/spark/network/util/AbstractFileRegion.java:27:20:
 "transfered" is a misspelling of "transferred"

common/unsafe/src/test/scala/org/apache/spark/unsafe/types/UTF8StringPropertyCheckSuite.scala:195:15:
 "orgin" is a misspelling of "origin"
core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:621:39: 
"gauranteed" is a misspelling of "guaranteed"
core/src/main/scala/org/apache/spark/status/storeTypes.scala:113:29: "ect" 
is a misspelling of "etc"
core/src/main/scala/org/apache/spark/storage/DiskStore.scala:282:18: 
"transfered" is a misspelling of "transferred"
core/src/main/scala/org/apache/spark/util/ListenerBus.scala:64:17: 
"overriden" is a misspelling of "overridden"
core/src/test/scala/org/apache/spark/ShuffleSuite.scala:211:7: 
"substracted" is a misspelling of "subtracted"

core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala:1922:49: 
"agriculteur" is a misspelling of "agriculture"

core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala:2468:84: 
"truely" is a misspelling of "truly"

core/src/test/scala/org/apache/spark/storage/FlatmapIteratorSuite.scala:25:18: 
"persistance" is a misspelling of "persistence"

core/src/test/scala/org/apache/spark/storage/FlatmapIteratorSuite.scala:26:69: 
"persistance" is a misspelling of "persistence"
data/streaming/AFINN-111.txt:1219:0: "humerous" is a misspelling of 
"humorous"
dev/run-pip-tests:55:28: "enviroments" is a misspelling of "environments"
dev/run-pip-tests:91:37: "virutal" is a misspelling of "virtual"
dev/merge_spark_pr.py:377:72: "accross" is a misspelling of "across"
dev/merge_spark_pr.py:378:66: "accross" is a misspelling of "across"
dev/run-pip-tests:126:25: "enviroments" is a misspelling of "environments"
docs/configuration.md:1830:82: "overriden" is a misspelling of "overridden"
docs/structured-streaming-programming-guide.md:525:45: "processs" is a 
misspelling of "processes"
docs/structured-streaming-programming-guide.md:1165:61: "BETWEN" is a 
misspelling of "BETWEEN"
docs/sql-programming-guide.md:1891:810: "behaivor" is a misspelling of 
"behavior"
examples/src/main/python/sql/arrow.py:98:8: "substract" is a misspelling of 
"subtract"
examples/src/main/python/sql/arrow.py:103:27: "substract" is a misspelling 
of "subtract"

[GitHub] spark issue #22069: [MINOR][DOC] Fix Java example code in Column's comments

2018-08-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22069
  
**[Test build #94555 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94555/testReport)**
 for PR 22069 at commit 
[`8520df8`](https://github.com/apache/spark/commit/8520df899a3364f2bb41d4155d2bed9e68772a07).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22069: [MINOR][DOC] Fix Java example code in Column's comments

2018-08-10 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22069
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22008: [SPARK-24928][SQL] Optimize cross join according to stat...

2018-08-10 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22008
  
cc @wzhfy 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-10 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21977#discussion_r209209021
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala ---
@@ -60,14 +61,20 @@ private[spark] object PythonEvalType {
  */
 private[spark] abstract class BasePythonRunner[IN, OUT](
 funcs: Seq[ChainedPythonFunctions],
-bufferSize: Int,
-reuseWorker: Boolean,
 evalType: Int,
-argOffsets: Array[Array[Int]])
+argOffsets: Array[Array[Int]],
+conf: SparkConf)
   extends Logging {
 
   require(funcs.length == argOffsets.length, "argOffsets should have the 
same length as funcs")
 
+  private val bufferSize = conf.getInt("spark.buffer.size", 65536)
+  private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", 
true)
+  // each python worker gets an equal part of the allocation. the worker 
pool will grow to the
+  // number of concurrent tasks, which is determined by the number of 
cores in this executor.
+  private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY)
+  .map(_ / conf.getInt("spark.executor.cores", 1))
--- End diff --

tiny nit: indentation


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-10 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21977#discussion_r209209726
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala
 ---
@@ -137,13 +135,12 @@ case class AggregateInPandasExec(
 
   val columnarBatchIter = new ArrowPythonRunner(
 pyFuncs,
-bufferSize,
-reuseWorker,
 PythonEvalType.SQL_GROUPED_AGG_PANDAS_UDF,
 argOffsets,
 aggInputSchema,
 sessionLocalTimeZone,
-pythonRunnerConf).compute(projectedRowIter, context.partitionId(), 
context)
+pythonRunnerConf,
+sparkContext.conf).compute(projectedRowIter, 
context.partitionId(), context)
--- End diff --

Yea, same question.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21732
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2039/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21732
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21732
  
**[Test build #94554 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94554/testReport)**
 for PR 21732 at commit 
[`80506f4`](https://github.com/apache/spark/commit/80506f4e98184ccd66dbaac14ec52d69c358020d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-10 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21732
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22007: [SPARK-25033] Bump Apache commons.{httpclient, httpcore}

2018-08-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22007
  
**[Test build #94553 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94553/testReport)**
 for PR 22007 at commit 
[`618de1e`](https://github.com/apache/spark/commit/618de1e71e5ce38b6f9a640a538bdfbf95b3ae7e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...

2018-08-10 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21868
  
??? why does this still target branch-2.3? is this a backport?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22007: [SPARK-25033] Bump Apache commons.{httpclient, httpcore}

2018-08-10 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22007
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistics to i...

2018-08-10 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16677


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22048: Fix the show method to display the wide character alignm...

2018-08-10 Thread xuejianbest

Github user xuejianbest commented on the issue:

https://github.com/apache/spark/pull/22048
  
After testing, it is found that regular expressions are changed to the 
following.
`val regex = """[^\x00-\u2e39]""".r`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-08-10 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merging to master. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22069: [MINOR][DOC] Fix Java example code in Column's comments

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22069
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type

2018-08-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22037
  
**[Test build #94552 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94552/testReport)**
 for PR 22037 at commit 
[`24dbada`](https://github.com/apache/spark/commit/24dbada0823e47b50892a34d19e1b8e2a63af7c3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22037
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22037
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2038/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22069: [MINOR][DOC] Fix Java example code in Column's comments

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22069
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22069: [MINOR][DOC] Fix Java example code in Column's comments

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22069
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22069: [MINOR][DOC] Fix Java example code in Column's co...

2018-08-10 Thread sadhen

GitHub user sadhen opened a pull request:

https://github.com/apache/spark/pull/22069

[MINOR][DOC] Fix Java example code in Column's comments

## What changes were proposed in this pull request?
Fix scaladoc in Column

## How was this patch tested?
None

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sadhen/spark fix_doc_minor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22069.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22069


commit 8520df899a3364f2bb41d4155d2bed9e68772a07
Author: å¿å¬ 
Date:   2018-08-10T09:24:08Z

Fix Java example code in Column's comments




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22017: [SPARK-23938][SQL] Add map_zip_with function

2018-08-10 Thread mn-mikke

Github user mn-mikke commented on a diff in the pull request:

https://github.com/apache/spark/pull/22017#discussion_r209188342
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
 ---
@@ -442,3 +442,186 @@ case class ArrayAggregate(
 
   override def prettyName: String = "aggregate"
 }
+
+/**
+ * Merges two given maps into a single map by applying function to the 
pair of values with
+ * the same key.
+ */
+@ExpressionDescription(
+  usage =
+"""
+  _FUNC_(map1, map2, function) - Merges two given maps into a single 
map by applying
+  function to the pair of values with the same key. For keys only 
presented in one map,
+  NULL will be passed as the value for the missing key. If an input 
map contains duplicated
+  keys, only the first entry of the duplicated key is passed into the 
lambda function.
+""",
+  examples = """
+Examples:
+  > SELECT _FUNC_(map(1, 'a', 2, 'b'), map(1, 'x', 2, 'y'), (k, v1, 
v2) -> concat(v1, v2));
+   {1:"ax",2:"by"}
+  """,
+  since = "2.4.0")
+case class MapZipWith(left: Expression, right: Expression, function: 
Expression)
+  extends HigherOrderFunction with CodegenFallback {
+
+  @transient lazy val functionForEval: Expression = functionsForEval.head
+
+  @transient lazy val (leftKeyType, leftValueType, leftValueContainsNull) =
+HigherOrderFunction.mapKeyValueArgumentType(left.dataType)
+
+  @transient lazy val (rightKeyType, rightValueType, 
rightValueContainsNull) =
+HigherOrderFunction.mapKeyValueArgumentType(right.dataType)
+
+  @transient lazy val keyType =
+TypeCoercion.findTightestCommonType(leftKeyType, 
rightKeyType).getOrElse(NullType)
--- End diff --

Even though there is a coercion rule for unification of key types. The key 
types may differ in nullability flags if they are complex. In theory, we could 
use ```==``` and ```findTightestCommonType``` in the coercion rule  since there 
is no codegen to be optimized for ```null``` checks. But unfortunatelly, 
```bind``` gets called once before execution of coercion rules, so 
```findTightestCommonType``` is important for setting up a correct input type 
for lamda function.

Maybe, we could play with order of analysis rules, but I'm not sure about 
all the consequences. @ueshin could shad some light on analysis rules ordering?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22065: [SPARK-23992][CORE] ShuffleDependency does not need to b...

2018-08-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22065
  
**[Test build #94551 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94551/testReport)**
 for PR 22065 at commit 
[`a99769d`](https://github.com/apache/spark/commit/a99769dd1aac779e972ed2e23aa7598e6d7c7105).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22065: [SPARK-23992][CORE] ShuffleDependency does not need to b...

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22065
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2037/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22065: [SPARK-23992][CORE] ShuffleDependency does not need to b...

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22065
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22065: [SPARK-23992][CORE] ShuffleDependency does not need to b...

2018-08-10 Thread 10110346

Github user 10110346 commented on the issue:

https://github.com/apache/spark/pull/22065
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22067
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22067
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94547/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...

2018-08-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22067
  
**[Test build #94547 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94547/testReport)**
 for PR 22067 at commit 
[`9e6941c`](https://github.com/apache/spark/commit/9e6941cfc89b16980bd5d4470baf21550ffd0877).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22068: [MINOR][DOC]Add missing compression codec .

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22068
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2036/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22068: [MINOR][DOC]Add missing compression codec .

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22068
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22068: [MINOR][DOC]Add missing compression codec .

2018-08-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22068
  
**[Test build #94550 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94550/testReport)**
 for PR 22068 at commit 
[`74aa80c`](https://github.com/apache/spark/commit/74aa80cb63c6ea98f0b9106f0724748931317c05).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22068: [MINOR][DOC]Add missing compression codec .

2018-08-10 Thread 10110346

GitHub user 10110346 opened a pull request:

https://github.com/apache/spark/pull/22068

[MINOR][DOC]Add missing compression codec .

## What changes were proposed in this pull request?

Parquet file provides six codecs: "snappy", "gzip", "lzo", "lz4", "brotli", 
"zstd". 
This pr add missing compression codec :"lz4", "brotli", "zstd" .
## How was this patch tested?
N/A


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/10110346/spark nosupportlz4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22068.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22068


commit 74aa80cb63c6ea98f0b9106f0724748931317c05
Author: liuxian 
Date:   2018-08-09T07:22:01Z

fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

2018-08-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22011
  
**[Test build #94549 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94549/testReport)**
 for PR 22011 at commit 
[`ea2330b`](https://github.com/apache/spark/commit/ea2330baa61e427665ba824c3c42d1e4ec1a7934).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22011
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22011
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2035/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20637: [SPARK-23466][SQL] Remove redundant null checks i...

2018-08-10 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20637#discussion_r209180525
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
 ---
@@ -43,25 +43,29 @@ object GenerateUnsafeProjection extends 
CodeGenerator[Seq[Expression], UnsafePro
 case _ => false
   }
 
-  // TODO: if the nullability of field is correct, we can use it to save 
null check.
   private def writeStructToBuffer(
   ctx: CodegenContext,
   input: String,
   index: String,
-  fieldTypes: Seq[DataType],
+  fieldTypeAndNullables: Seq[(DataType, Boolean)],
--- End diff --

I think that it would be good since it is used at `JavaTypeInference` and 
`higherOrderFunctions`.
cc @ueshin


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20637: [SPARK-23466][SQL] Remove redundant null checks i...

2018-08-10 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20637#discussion_r209178573
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
 ---
@@ -170,6 +174,23 @@ object GenerateUnsafeProjection extends 
CodeGenerator[Seq[Expression], UnsafePro
 
 val element = CodeGenerator.getValue(tmpInput, et, index)
 
+val primitiveTypeName = if (CodeGenerator.isPrimitiveType(jt)) {
--- End diff --

good catch


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22066: [WIP][SPARK-25084][SQL] "distribute by" on multiple colu...

2018-08-10 Thread yucai

Github user yucai commented on the issue:

https://github.com/apache/spark/pull/22066
  
@cloud-fan I am refining and adding tests.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21199: [SPARK-24127][SS] Continuous text socket source

2018-08-10 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21199


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...

2018-08-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22067
  
**[Test build #94547 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94547/testReport)**
 for PR 22067 at commit 
[`9e6941c`](https://github.com/apache/spark/commit/9e6941cfc89b16980bd5d4470baf21550ffd0877).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source

2018-08-10 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21199
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...

2018-08-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22053
  
**[Test build #94548 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94548/testReport)**
 for PR 22053 at commit 
[`d95d357`](https://github.com/apache/spark/commit/d95d35794528702a2de5523ca00334d479598c57).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22053
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22053
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2034/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...

2018-08-10 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22053
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...

2018-08-10 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/22067
  
ok to test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...

2018-08-10 Thread LantaoJin

Github user LantaoJin commented on the issue:

https://github.com/apache/spark/pull/22067
  
@cloud-fan @jerryshao 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type

2018-08-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22037
  
**[Test build #94546 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94546/testReport)**
 for PR 22037 at commit 
[`9eefbe5`](https://github.com/apache/spark/commit/9eefbe5dc58bba272dedce7ae0174be89a0a9b28).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22037
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22037
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2033/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...

2018-08-10 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/22036
  
cc @cloud-fan @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22067
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22066: [WIP][SPARK-25084][SQL] "distribute by" on multiple colu...

2018-08-10 Thread LantaoJin

Github user LantaoJin commented on the issue:

https://github.com/apache/spark/pull/22066
  
I offer other fix way. #22067 
It doesn't need "input" as a global variable (If distribute by random)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22067
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22067
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21439: [SPARK-24391][SQL] Support arrays of any types by from_j...

2018-08-10 Thread MaxGekk

Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/21439
  
@gatorsmile Could you look at the PR, please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22067: [SPARK-25084][SQL] distribute by on multiple colu...

2018-08-10 Thread LantaoJin

GitHub user LantaoJin opened a pull request:

https://github.com/apache/spark/pull/22067

[SPARK-25084][SQL] distribute by on multiple columns may lead to codeâ¦

â¦gen issue

## What changes were proposed in this pull request?

"distribute by" on multiple columns may lead to codegen issue

## How was this patch tested?

manual test


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/LantaoJin/spark SPARK-25084

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22067.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22067


commit 9e6941cfc89b16980bd5d4470baf21550ffd0877
Author: LantaoJin 
Date:   2018-08-10T07:12:32Z

[SPARK-25084][SQL] distribute by on multiple columns may lead to codegen 
issue




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22038: [SPARK-25056][SQL] Unify the InConversion and Bin...

2018-08-10 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22038#discussion_r209163143
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
 ---
@@ -1378,8 +1378,8 @@ class TypeCoercionSuite extends AnalysisTest {
 )
 ruleTest(inConversion,
   In(Literal("a"), Seq(Literal(1), Literal("b"))),
-  In(Cast(Literal("a"), StringType),
-Seq(Cast(Literal(1), StringType), Cast(Literal("b"), StringType)))
+  In(Cast(Literal("a"), IntegerType),
--- End diff --

mmmh...honestly in this case I'd rather say that string is a better type 
for the cast than int. I am not sure which is the result of casting "a" and "b" 
to int...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22064: [MINOR][BUILD] Add ECCN notice required by http://www.ap...

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22064
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22064: [MINOR][BUILD] Add ECCN notice required by http://www.ap...

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22064
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94537/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21732
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94542/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22065: [SPARK-23992][CORE] ShuffleDependency does not need to b...

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22065
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22065: [SPARK-23992][CORE] ShuffleDependency does not need to b...

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22065
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94541/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21732
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21732
  
**[Test build #94542 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94542/testReport)**
 for PR 21732 at commit 
[`80506f4`](https://github.com/apache/spark/commit/80506f4e98184ccd66dbaac14ec52d69c358020d).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `   * For example, we build an encoder for `case class Data(a: Int, b: 
String)` and the real type`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22064: [MINOR][BUILD] Add ECCN notice required by http://www.ap...

2018-08-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22064
  
**[Test build #94537 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94537/testReport)**
 for PR 22064 at commit 
[`878e5ca`](https://github.com/apache/spark/commit/878e5ca274a3b9e5fe37f4e0c2ed4b499bc81676).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22053
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...

2018-08-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22037#discussion_r209162410
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala 
---
@@ -138,10 +142,21 @@ class AvroDeserializer(rootAvroType: Schema, 
rootCatalystType: DataType) {
 bytes
   case b: Array[Byte] => b
   case other => throw new RuntimeException(s"$other is not a valid 
avro binary.")
-
 }
 updater.set(ordinal, bytes)
 
+  case (FIXED, d: DecimalType) => (updater, ordinal, value) =>
+val bigDecimal = 
decimalConversions.fromFixed(value.asInstanceOf[GenericFixed], avroType,
+  LogicalTypes.decimal(d.precision, d.scale))
--- End diff --

ok let's leave it. We can always add later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22065: [SPARK-23992][CORE] ShuffleDependency does not need to b...

2018-08-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22065
  
**[Test build #94541 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94541/testReport)**
 for PR 22065 at commit 
[`a99769d`](https://github.com/apache/spark/commit/a99769dd1aac779e972ed2e23aa7598e6d7c7105).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22053
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94545/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22038: [SPARK-25056][SQL] Unify the InConversion and BinaryComp...

2018-08-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22038
  
**[Test build #94544 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94544/testReport)**
 for PR 22038 at commit 
[`cb25b78`](https://github.com/apache/spark/commit/cb25b788cfc3cd7799a6671713558a32969f6dff).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22038: [SPARK-25056][SQL] Unify the InConversion and BinaryComp...

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22038
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94544/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22038: [SPARK-25056][SQL] Unify the InConversion and BinaryComp...

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22038
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...

2018-08-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22053
  
**[Test build #94545 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94545/testReport)**
 for PR 22053 at commit 
[`d95d357`](https://github.com/apache/spark/commit/d95d35794528702a2de5523ca00334d479598c57).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22017: [SPARK-23938][SQL] Add map_zip_with function

2018-08-10 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22017#discussion_r209160027
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
 ---
@@ -442,3 +442,186 @@ case class ArrayAggregate(
 
   override def prettyName: String = "aggregate"
 }
+
+/**
+ * Merges two given maps into a single map by applying function to the 
pair of values with
+ * the same key.
+ */
+@ExpressionDescription(
+  usage =
+"""
+  _FUNC_(map1, map2, function) - Merges two given maps into a single 
map by applying
+  function to the pair of values with the same key. For keys only 
presented in one map,
+  NULL will be passed as the value for the missing key. If an input 
map contains duplicated
+  keys, only the first entry of the duplicated key is passed into the 
lambda function.
+""",
+  examples = """
+Examples:
+  > SELECT _FUNC_(map(1, 'a', 2, 'b'), map(1, 'x', 2, 'y'), (k, v1, 
v2) -> concat(v1, v2));
+   {1:"ax",2:"by"}
+  """,
+  since = "2.4.0")
+case class MapZipWith(left: Expression, right: Expression, function: 
Expression)
+  extends HigherOrderFunction with CodegenFallback {
+
+  @transient lazy val functionForEval: Expression = functionsForEval.head
+
+  @transient lazy val (leftKeyType, leftValueType, leftValueContainsNull) =
+HigherOrderFunction.mapKeyValueArgumentType(left.dataType)
+
+  @transient lazy val (rightKeyType, rightValueType, 
rightValueContainsNull) =
+HigherOrderFunction.mapKeyValueArgumentType(right.dataType)
+
+  @transient lazy val keyType =
+TypeCoercion.findTightestCommonType(leftKeyType, 
rightKeyType).getOrElse(NullType)
--- End diff --

why do we need this? We are enforcing that the two maps have the same key 
type, can't we just get one?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...

2018-08-10 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20637
  
> When spark.sql.fromJsonForceNullableSchema=false, I think that a test is 
invalid to pass nullable=false in the corresponding schema to the missing field.

+1. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20637: [SPARK-23466][SQL] Remove redundant null checks i...

2018-08-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20637#discussion_r209161074
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
 ---
@@ -170,6 +174,23 @@ object GenerateUnsafeProjection extends 
CodeGenerator[Seq[Expression], UnsafePro
 
 val element = CodeGenerator.getValue(tmpInput, et, index)
 
+val primitiveTypeName = if (CodeGenerator.isPrimitiveType(jt)) {
--- End diff --

where do we use it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20637: [SPARK-23466][SQL] Remove redundant null checks i...

2018-08-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20637#discussion_r209160237
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
 ---
@@ -43,25 +43,29 @@ object GenerateUnsafeProjection extends 
CodeGenerator[Seq[Expression], UnsafePro
 case _ => false
   }
 
-  // TODO: if the nullability of field is correct, we can use it to save 
null check.
   private def writeStructToBuffer(
   ctx: CodegenContext,
   input: String,
   index: String,
-  fieldTypes: Seq[DataType],
+  fieldTypeAndNullables: Seq[(DataType, Boolean)],
--- End diff --

shall we create a class for `(DataType, Boolean)`? it can also be used in 
https://github.com/apache/spark/pull/22063


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22019: [WIP][SPARK-25040][SQL] Empty string for double and floa...

2018-08-10 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22019
  
SGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22019: [WIP][SPARK-25040][SQL] Empty string for double and floa...

2018-08-10 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/22019
  
I agree with this proposal @HyukjinKwon. I think it is wrong to consider as 
a null an empty string. An empty string is not a valid value for an 
int/double/... So in case we have, we should fail I think.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22066: [WIP][SPARK-25084][SQL] "distribute by" on multiple colu...

2018-08-10 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22066
  
can you add a test first?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22044: [SPARK-23912][SQL][Followup] Refactor ArrayDistin...

2018-08-10 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22044


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94536/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94536 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94536/testReport)**
 for PR 21889 at commit 
[`51f0dc5`](https://github.com/apache/spark/commit/51f0dc59c6403aa862e18ff0192fc37b87d22320).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21933: [SPARK-24917][CORE] make chunk size configurable

2018-08-10 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21933
  
cc @squito too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22044: [SPARK-23912][SQL][Followup] Refactor ArrayDistinct

2018-08-10 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/22044
  
Thanks! merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type

2018-08-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22037
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 >

401 - 500 of 526 matches

Mail list logo