[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-17 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22112
  
@mridulm shuffled RDD will never be deterministic unless the shuffle key is 
the entire record and key ordering is specified. The reduce task fetches 
multiple remote shuffle blocks at the same time, so the order is always random. 
In Addition, Spark SQL never specifies key ordering.

Checkpointing will cut down the RDD lineage, and change the RDD dependency 
to a `OneToOneDependency` of `CheckpointRDD`, so we don't need to care about it.

@tgravescs Forget to mention that it's a temporary workaround to fail with 
result task. I looked into it and we need to change the semantics of 
`FileCommitProtocol` to fix it. Maybe it's better to do it in Spark 3.0?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22112: [SPARK-23243][Core] Fix RDD.repartition() data co...

2018-08-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22112#discussion_r211065925
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1864,6 +1877,22 @@ abstract class RDD[T: ClassTag](
   // From performance concern, cache the value to avoid repeatedly compute 
`isBarrier()` on a long
   // RDD chain.
   @transient protected lazy val isBarrier_ : Boolean = 
dependencies.exists(_.rdd.isBarrier())
+
+  /**
+   * Whether the RDD's computing function is idempotent. Idempotent means 
the computing function
+   * not only satisfies the requirement, but also produce the same output 
sequence(the output order
+   * can't vary) given the same input sequence. Spark assumes all the RDDs 
are idempotent, except
+   * for the shuffle RDD and RDDs derived from non-idempotent RDD.
+   */
--- End diff --

yes, that is expected, unless the computing function sorts the input data. 
For this case, we can override the `isIdempotent`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20725: [SPARK-23555][PYTHON] Add BinaryType support for Arrow i...

2018-08-17 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/20725
  
merged to master, thanks @shaneknapp and @HyukjinKwon !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20725: [SPARK-23555][PYTHON] Add BinaryType support for ...

2018-08-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20725


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22138
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22138
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94913/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22138
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94914/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22138
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22138
  
**[Test build #94913 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94913/testReport)**
 for PR 22138 at commit 
[`94231fe`](https://github.com/apache/spark/commit/94231fef1f2f59cea1625fd1f71bd99372a8e800).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22138
  
**[Test build #94914 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94914/testReport)**
 for PR 22138 at commit 
[`fd728ef`](https://github.com/apache/spark/commit/fd728ef8c99ebb33d6dba5466e6a8dba8984248d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...

2018-08-17 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20637
  
cc @ueshin @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21320
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21320
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94915/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21320
  
**[Test build #94915 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94915/testReport)**
 for PR 21320 at commit 
[`1573ae8`](https://github.com/apache/spark/commit/1573ae888d651a51e0d60683117714fba7c55fb0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22137: [MINOR][DOC][SQL] use one line for annotation arg value

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22137
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94911/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22137: [MINOR][DOC][SQL] use one line for annotation arg value

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22137
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22137: [MINOR][DOC][SQL] use one line for annotation arg value

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22137
  
**[Test build #94911 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94911/testReport)**
 for PR 22137 at commit 
[`e9a9376`](https://github.com/apache/spark/commit/e9a93762aeeb219cf9ab600da248a0d1f295d09f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22131: [SPARK-25141][SQL][TEST] Modify tests for higher-order f...

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22131
  
**[Test build #94919 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94919/testReport)**
 for PR 22131 at commit 
[`6f9660d`](https://github.com/apache/spark/commit/6f9660d79e2ae8b7c64dbfea850c514ad3404f37).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22131: [SPARK-25141][SQL][TEST] Modify tests for higher-order f...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22131
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2295/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22131: [SPARK-25141][SQL][TEST] Modify tests for higher-order f...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22131
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22131: [SPARK-25141][SQL][TEST] Modify tests for higher-order f...

2018-08-17 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/22131
  
@mgaido91 @mn-mikke On second thought, how about this?
If you don't like it, I'll revert it soon.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22130: [SPARK-25137][Spark Shell] NumberFormatException` when s...

2018-08-17 Thread vinodkc
Github user vinodkc commented on the issue:

https://github.com/apache/spark/pull/22130
  
@dongjoon-hyun , Thanks for taking a look at this PR, I've added Mac OS 
version in the PR description,
IMO, an update of ncurses is causing this issue 
Reference :  https://github.com/jline/jline2/issues/281


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21909
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21909
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94909/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21909
  
**[Test build #94909 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94909/testReport)**
 for PR 21909 at commit 
[`96a94cc`](https://github.com/apache/spark/commit/96a94ccaed1f68fa7eaf3fc286540e531d9a9506).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20226: [SPARK-23034][SQL] Override `nodeName` for all *ScanExec...

2018-08-17 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20226
  
sure, will do, too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21909
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94908/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21909
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21909
  
**[Test build #94908 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94908/testReport)**
 for PR 21909 at commit 
[`2d8e754`](https://github.com/apache/spark/commit/2d8e754e699076c8a5915e7faf971e4bd2a5c1fd).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21306
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21306
  
**[Test build #94918 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94918/testReport)**
 for PR 21306 at commit 
[`dca4bf8`](https://github.com/apache/spark/commit/dca4bf8176eaa92de295de54488c3398256e0f7a).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class V1TableCatalog(sessionState: SessionState) extends TableCatalog `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21306
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94918/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21306
  
**[Test build #94918 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94918/testReport)**
 for PR 21306 at commit 
[`dca4bf8`](https://github.com/apache/spark/commit/dca4bf8176eaa92de295de54488c3398256e0f7a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21306
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21306
  
**[Test build #94917 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94917/testReport)**
 for PR 21306 at commit 
[`fa0edeb`](https://github.com/apache/spark/commit/fa0edeb1570485cc7d6cd0f848caaaf20480f384).
 * This patch **fails to generate documentation**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class V1TableCatalog(sessionState: SessionState) extends TableCatalog `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21306
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94917/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21306
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21306
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2294/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-08-17 Thread rdblue
Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/21306#discussion_r211057651
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalog/v2/V1MetadataTable.scala 
---
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalog.v2
+
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.SaveMode
+import org.apache.spark.sql.catalog.v2.PartitionTransforms.{bucket, 
identity}
+import org.apache.spark.sql.catalyst.catalog.CatalogTable
+import org.apache.spark.sql.sources.v2.{DataSourceOptions, DataSourceV2, 
ReadSupport, WriteSupport}
+import org.apache.spark.sql.sources.v2.reader.DataSourceReader
+import org.apache.spark.sql.sources.v2.writer.DataSourceWriter
+import org.apache.spark.sql.types.StructType
+
+/**
+ * An implementation of catalog v2 [[Table]] to expose v1 table metadata.
+ */
+private[sql] class V1MetadataTable(
--- End diff --

@cloud-fan, I updated this PR that adds the `TableCatalog` API to include 
an implementation that uses the existing `SessionCatalog`. This `Table` class 
demonstrates how `Table` would implement `ReadSupport` and `WriteSupport`.

The catalog returns these tables, which have `ReadSupport` and 
`WriteSupport` mixed in depending on whether the underlying `DataSourceV2` also 
supports them. In your updated API, it would use the `ReadSupportProvider` 
instead of the `DataSourceV2` directly, but the difference isn't very large.

The follow-up PR for CTAS and RTAS, #21877, demonstrates how this would be 
used in the new logical plans.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22139: [SPARK-25149][GraphX] Update Parallel Personalized Page ...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22139
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94916/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22139: [SPARK-25149][GraphX] Update Parallel Personalized Page ...

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22139
  
**[Test build #94916 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94916/testReport)**
 for PR 22139 at commit 
[`25dc63a`](https://github.com/apache/spark/commit/25dc63a0ac09ef900770c31e817f230ec98f658f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22139: [SPARK-25149][GraphX] Update Parallel Personalized Page ...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22139
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20226: [SPARK-23034][SQL] Override `nodeName` for all *ScanExec...

2018-08-17 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20226
  
@maropu Could you take this over?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21306
  
**[Test build #94917 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94917/testReport)**
 for PR 21306 at commit 
[`fa0edeb`](https://github.com/apache/spark/commit/fa0edeb1570485cc7d6cd0f848caaaf20480f384).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21306
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21306
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2293/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22139: [SPARK-25149][GraphX] Update Parallel Personalized Page ...

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22139
  
**[Test build #94916 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94916/testReport)**
 for PR 22139 at commit 
[`25dc63a`](https://github.com/apache/spark/commit/25dc63a0ac09ef900770c31e817f230ec98f658f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22139: [SPARK-25149][GraphX] Update Parallel Personalized Page ...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22139
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22139: [SPARK-25149][GraphX] Update Parallel Personalized Page ...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22139
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2292/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22139: [SPARK-25149][GraphX] Update Parallel Personalize...

2018-08-17 Thread MrBago
GitHub user MrBago opened a pull request:

https://github.com/apache/spark/pull/22139

[SPARK-25149][GraphX] Update Parallel Personalized Page Rank to test with 
large vertexIds

## What changes were proposed in this pull request?

runParallelPersonalizedPageRank in graphx checks that `sources` are <= 
Int.MaxValue.toLong, but this is not actually required. This check seems to 
have been added because we use sparse vectors in the implementation and sparse 
vectors cannot be indexed by values > MAX_INT. However we do not ever index the 
sparse vector by the source vertexIds so this isn't an issue. I've added a test 
with large vertexIds to confirm this works as expected.

## How was this patch tested?

Unit tests.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MrBago/spark remove-veretexId-check-pppr

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22139.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22139


commit e720eab9a435a738be9f08ccaefba2f4eb7dc867
Author: Bago Amirbekian 
Date:   2018-08-17T23:43:25Z

Update Parallel Personalized Page Rank to test with large vertexIds




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21320
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21320
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2291/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21320
  
**[Test build #94915 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94915/testReport)**
 for PR 21320 at commit 
[`1573ae8`](https://github.com/apache/spark/commit/1573ae888d651a51e0d60683117714fba7c55fb0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22138
  
**[Test build #94914 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94914/testReport)**
 for PR 22138 at commit 
[`fd728ef`](https://github.com/apache/spark/commit/fd728ef8c99ebb33d6dba5466e6a8dba8984248d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22138
  
**[Test build #94913 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94913/testReport)**
 for PR 22138 at commit 
[`94231fe`](https://github.com/apache/spark/commit/94231fef1f2f59cea1625fd1f71bd99372a8e800).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-08-17 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/22138
  
cc. @tdas @zsxwing @koeninger @arunmahadevan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22138: [SPARK-25151][SS] Apply Apache Commons Pool to Ka...

2018-08-17 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/spark/pull/22138#discussion_r211053868
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
 ---
@@ -425,70 +381,36 @@ private[kafka010] object KafkaDataConsumer extends 
Logging {
   def acquire(
   topicPartition: TopicPartition,
   kafkaParams: ju.Map[String, Object],
-  useCache: Boolean): KafkaDataConsumer = synchronized {
-val key = new CacheKey(topicPartition, kafkaParams)
-val existingInternalConsumer = cache.get(key)
+  useCache: Boolean): KafkaDataConsumer = {
 
-lazy val newInternalConsumer = new 
InternalKafkaConsumer(topicPartition, kafkaParams)
+if (!useCache) {
+  return NonCachedKafkaDataConsumer(new 
InternalKafkaConsumer(topicPartition, kafkaParams))
+}
 
-if (TaskContext.get != null && TaskContext.get.attemptNumber >= 1) {
-  // If this is reattempt at running the task, then invalidate cached 
consumer if any and
-  // start with a new one.
-  if (existingInternalConsumer != null) {
-// Consumer exists in cache. If its in use, mark it for closing 
later, or close it now.
-if (existingInternalConsumer.inUse) {
-  existingInternalConsumer.markedForClose = true
-} else {
-  existingInternalConsumer.close()
-}
-  }
-  cache.remove(key)  // Invalidate the cache in any case
-  NonCachedKafkaDataConsumer(newInternalConsumer)
+val key = new CacheKey(topicPartition, kafkaParams)
 
-} else if (!useCache) {
-  // If planner asks to not reuse consumers, then do not use it, 
return a new consumer
-  NonCachedKafkaDataConsumer(newInternalConsumer)
+if (TaskContext.get != null && TaskContext.get.attemptNumber >= 1) {
+  // If this is reattempt at running the task, then invalidate cached 
consumer if any.
 
-} else if (existingInternalConsumer == null) {
-  // If consumer is not already cached, then put a new in the cache 
and return it
-  cache.put(key, newInternalConsumer)
-  newInternalConsumer.inUse = true
-  CachedKafkaDataConsumer(newInternalConsumer)
+  // invalidate all idle consumers for the key
+  pool.invalidateKey(key)
 
-} else if (existingInternalConsumer.inUse) {
-  // If consumer is already cached but is currently in use, then 
return a new consumer
-  NonCachedKafkaDataConsumer(newInternalConsumer)
+  // borrow a consumer from pool even in this case
+}
 
-} else {
-  // If consumer is already cached and is currently not in use, then 
return that consumer
-  existingInternalConsumer.inUse = true
-  CachedKafkaDataConsumer(existingInternalConsumer)
+try {
+  CachedKafkaDataConsumer(pool.borrowObject(key, kafkaParams))
+} catch { case _: NoSuchElementException =>
+  // There's neither idle object to clean up nor available space in 
pool:
+  // fail back to create non-cached consumer
--- End diff --

This approach introduces behavior change: even though `cache` had capacity, 
the `cache` worked like soft capacity and allowed adding item to the cache when 
there's neither idle object nor free space. 

New behavior of the KafkaDataConsumer is creating all the objects to 
non-cached whenever pool is exhausted and there's no idle object to free up.

I think it is not a big deal when we configure 
"spark.sql.kafkaConsumerCache.capacity" properly, and having hard capacity 
feels more convenient to determine what's going on.

However we can still mimic the current behavior with having infinite 
capacity, so we can be back to current behavior if we feel it makes more sense.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21899: [SPARK-24912][SQL] Don't obscure source of OOM during br...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21899
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21899: [SPARK-24912][SQL] Don't obscure source of OOM during br...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21899
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94910/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21899: [SPARK-24912][SQL] Don't obscure source of OOM during br...

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21899
  
**[Test build #94910 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94910/testReport)**
 for PR 21899 at commit 
[`829a333`](https://github.com/apache/spark/commit/829a333ad3dc152b90e5257cf67e2134c31e839e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22138
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22138
  
**[Test build #94912 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94912/testReport)**
 for PR 22138 at commit 
[`c82f306`](https://github.com/apache/spark/commit/c82f3064fa8744f91b5c8a92645588dc9d53ba35).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class PooledObjectInvalidated(key: CacheKey, 
lastInvalidatedTimestamp: Long,`
  * `  class PoolConfig extends 
GenericKeyedObjectPoolConfig[InternalKafkaConsumer] `
  * `  case class CacheKey(groupId: String, topicPartition: TopicPartition) 
`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22138
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94912/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22138
  
**[Test build #94912 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94912/testReport)**
 for PR 22138 at commit 
[`c82f306`](https://github.com/apache/spark/commit/c82f3064fa8744f91b5c8a92645588dc9d53ba35).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22138
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22138
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22138: [SPARK-25151][SS] Apply Apache Commons Pool to Ka...

2018-08-17 Thread HeartSaVioR
GitHub user HeartSaVioR opened a pull request:

https://github.com/apache/spark/pull/22138

[SPARK-25151][SS] Apply Apache Commons Pool to KafkaDataConsumer

## What changes were proposed in this pull request?

KafkaDataConsumer contains its own logic for caching InternalKafkaConsumer 
which looks like can be simplified via applying Apache Commons Pool. Benefits 
of applying Apache Commons Pool are following:

* We can get rid of synchronization of KafkaDataConsumer object while 
acquiring and returning InternalKafkaConsumer.
* We can extract the feature of object pool to outside of the class, so 
that the behaviors of the pool can be tested easily.
* We can get various statistics for the object pool, and also be able to 
enable JMX for the pool.

This patch brings additional dependency, Apache Commons Pool 2.6.0 into 
`spark-sql-kafka-0-10` module.

## How was this patch tested?

Existing unit tests as well as new tests for object pool.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HeartSaVioR/spark SPARK-25151

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22138.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22138


commit c82f3064fa8744f91b5c8a92645588dc9d53ba35
Author: Jungtaek Lim 
Date:   2018-08-17T09:56:31Z

[SPARK-25151][SS] Apply Apache Commons Pool to KafkaDataConsumer




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21306
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21306
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94906/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21306
  
**[Test build #94906 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94906/testReport)**
 for PR 21306 at commit 
[`622180a`](https://github.com/apache/spark/commit/622180a50e05b4d968380824f5dbbe5f89e42422).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class Transforms `
  * `  public static final class Identity extends SingleColumnTransform `
  * `  public static final class Bucket extends SingleColumnTransform `
  * `  public static final class Year extends SingleColumnTransform `
  * `  public static final class Month extends SingleColumnTransform `
  * `  public static final class Date extends SingleColumnTransform `
  * `  public static final class DateAndHour extends SingleColumnTransform `
  * `  public static final class Apply extends SingleColumnTransform `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22134: [SPARK-25143][SQL] Support data source name mapping conf...

2018-08-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22134
  
I got it. I'll close this approach.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22134: [SPARK-25143][SQL] Support data source name mappi...

2018-08-17 Thread dongjoon-hyun
Github user dongjoon-hyun closed the pull request at:

https://github.com/apache/spark/pull/22134


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19041: [SPARK-21097][CORE] Add option to recover cached data

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19041
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

2018-08-17 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/21584
  
I filed https://issues.apache.org/jira/browse/SPARK-25152 for the 
integration tests.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR ...

2018-08-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21584


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

2018-08-17 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/21584
  
Ok I am merging this to master now. Thanks for the work on this!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22137: [MINOR][DOC][SQL] use one line for annotation arg value

2018-08-17 Thread mengxr
Github user mengxr commented on the issue:

https://github.com/apache/spark/pull/22137
  
cc: @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22137: [MINOR][DOC][SQL] use one line for annotation arg value

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22137
  
**[Test build #94911 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94911/testReport)**
 for PR 22137 at commit 
[`e9a9376`](https://github.com/apache/spark/commit/e9a93762aeeb219cf9ab600da248a0d1f295d09f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22137: [MINOR][DOC][SQL] use one line for annotation arg value

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22137
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2290/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22137: [MINOR][DOC][SQL] use one line for annotation arg value

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22137
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22137: [MINOR][DOC][SQL] use one line for annotation arg...

2018-08-17 Thread mengxr
GitHub user mengxr opened a pull request:

https://github.com/apache/spark/pull/22137

[MINOR][DOC][SQL] use one line for annotation arg value

## What changes were proposed in this pull request?

Put annotation args in one line, or API doc generation will fail.

~~~
[error] 
/Users/meng/src/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:1559:
 annotation argument needs to be a constant; found: "_FUNC_(expr) - Returns the 
character length of string data or number of bytes of ".+("binary data. The 
length of string data includes the trailing spaces. The length of binary 
").+("data includes binary zeros.")
[error] "binary data. The length of string data includes the trailing 
spaces. The length of binary " +
[error] 
 ^
[info] No documentation generated with unsuccessful compiler run
[error] one error found
[error] (catalyst/compile:doc) Scaladoc generation failed
[error] Total time: 27 s, completed Aug 17, 2018 3:20:08 PM
~~~

## How was this patch tested?

sbt catalyst/compile:doc passed



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mengxr/spark minor-doc-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22137.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22137


commit e9a93762aeeb219cf9ab600da248a0d1f295d09f
Author: Xiangrui Meng 
Date:   2018-08-17T22:47:04Z

fix a minor issue to generate API docs




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-08-17 Thread cclauss
Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/20838
  
It was reverted because [__slice_test()__](#22128) was causing the build to 
fail.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20637
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20637
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94904/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20637
  
**[Test build #94904 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94904/testReport)**
 for PR 20637 at commit 
[`84961b4`](https://github.com/apache/spark/commit/84961b44d0f846e241c322f0f80d8dc032f6008d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21899: [SPARK-24912][SQL] Don't obscure source of OOM du...

2018-08-17 Thread bersprockets
Github user bersprockets commented on a diff in the pull request:

https://github.com/apache/spark/pull/21899#discussion_r211047556
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
 ---
@@ -118,12 +119,20 @@ case class BroadcastExchangeExec(
   // SparkFatalException, which is a subclass of Exception. 
ThreadUtils.awaitResult
   // will catch this exception and re-throw the wrapped fatal 
throwable.
   case oe: OutOfMemoryError =>
-throw new SparkFatalException(
+val sizeMessage = if (dataSize != -1) {
+  s"${SparkLauncher.DRIVER_MEMORY} by at least the estimated 
size of the " +
--- End diff --

@hvanhovell That's what was being obscured :).

In testing this, I've seen various places. In the three cases I have seen 
first hand:


java.lang.OutOfMemoryError: Not enough memory to build and broadcast the 
table to all worker nodes. As a workaround, you can either disable broadcast by 
setting spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver 
memory by setting spark.driver.memory to a higher value.
  at 
org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.grow(HashedRelation.scala:628)
  at 
org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.append(HashedRelation.scala:570)
  at 
org.apache.spark.sql.execution.joins.LongHashedRelation$.apply(HashedRelation.scala:865)

At that line is an allocation:

val newPage = new Array[Long](newNumWords.toInt)

2nd case:

java.lang.OutOfMemoryError: Not enough memory to build and broadcast the 
table to all worker nodes. As a workaround, you can either disable broadcast by 
setting spark.sql.autoBroadcastJoinThreshold to -1 or increase 
spark.driver.memory by at least the estimated size of the relation (96468992 
bytes).
  at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
  at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
  at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$3.apply(TorrentBroadcast.scala:286)
  at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$3.apply(TorrentBroadcast.scala:286)

3rd case:

java.lang.OutOfMemoryError: Not enough memory to build and broadcast the 
table to all worker nodes. As a workaround, you can either disable broadcast by 
setting \
spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver 
memory by setting spark.driver.memory to a higher value.
  at 
org.apache.spark.unsafe.memory.MemoryBlock.allocateFromObject(MemoryBlock.java:118)
  at 
org.apache.spark.sql.catalyst.expressions.UnsafeRow.getUTF8String(UnsafeRow.java:420)
  at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
 Source)
  at 
org.apache.spark.sql.execution.joins.UnsafeHashedRelation$.apply(HashedRelation.scala:311)

At that line is also an allocation:

mb = new ByteArrayMemoryBlock(array, offset, length);





---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-17 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21909#discussion_r211045699
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala
 ---
@@ -223,7 +224,8 @@ object MultiLineJsonDataSource extends JsonDataSource {
   input => parser.parse[InputStream](input, streamParser, 
partitionedFileString),
   parser.options.parseMode,
   schema,
-  parser.options.columnNameOfCorruptRecord)
+  parser.options.columnNameOfCorruptRecord,
+  optimizeEmptySchema = false)
--- End diff --

Could we rename `optimizeEmptySchema ` to `isMultiLine`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-17 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21909#discussion_r211045061
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1492,6 +1492,15 @@ object SQLConf {
 "This usually speeds up commands that need to list many 
directories.")
   .booleanConf
   .createWithDefault(true)
+
+  val BYPASS_PARSER_FOR_EMPTY_SCHEMA =
+buildConf("spark.sql.legacy.bypassParserForEmptySchema")
--- End diff --

If no behavior change, do we still need this conf?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21899: [SPARK-24912][SQL] Don't obscure source of OOM du...

2018-08-17 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/21899#discussion_r211044133
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
 ---
@@ -118,12 +119,20 @@ case class BroadcastExchangeExec(
   // SparkFatalException, which is a subclass of Exception. 
ThreadUtils.awaitResult
   // will catch this exception and re-throw the wrapped fatal 
throwable.
   case oe: OutOfMemoryError =>
-throw new SparkFatalException(
+val sizeMessage = if (dataSize != -1) {
+  s"${SparkLauncher.DRIVER_MEMORY} by at least the estimated 
size of the " +
--- End diff --

Forgive me for asking a dumb question, but where will this exception come 
from? The block manager?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22085: [WIP][SPARK-25095][PySpark] Python support for BarrierTa...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22085
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94901/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22085: [WIP][SPARK-25095][PySpark] Python support for BarrierTa...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22085
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22085: [WIP][SPARK-25095][PySpark] Python support for BarrierTa...

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22085
  
**[Test build #94901 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94901/testReport)**
 for PR 22085 at commit 
[`e234a0a`](https://github.com/apache/spark/commit/e234a0a3d4e740d757fe086b0971a10f621d518b).
 * This patch **fails from timeout after a configured wait of \`340m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21899: [SPARK-24912][SQL] Don't obscure source of OOM during br...

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21899
  
**[Test build #94910 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94910/testReport)**
 for PR 21899 at commit 
[`829a333`](https://github.com/apache/spark/commit/829a333ad3dc152b90e5257cf67e2134c31e839e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21899: [SPARK-24912][SQL] Don't obscure source of OOM during br...

2018-08-17 Thread bersprockets
Github user bersprockets commented on the issue:

https://github.com/apache/spark/pull/21899
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21899: [SPARK-24912][SQL] Don't obscure source of OOM during br...

2018-08-17 Thread bersprockets
Github user bersprockets commented on the issue:

https://github.com/apache/spark/pull/21899
  
@MaxGekk In the updated message, I left out "hash" from the term "hash 
relation" only because it seems the relation could be also be an Array.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21909
  
**[Test build #94909 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94909/testReport)**
 for PR 21909 at commit 
[`96a94cc`](https://github.com/apache/spark/commit/96a94ccaed1f68fa7eaf3fc286540e531d9a9506).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21909
  
**[Test build #94908 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94908/testReport)**
 for PR 21909 at commit 
[`2d8e754`](https://github.com/apache/spark/commit/2d8e754e699076c8a5915e7faf971e4bd2a5c1fd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21899: [SPARK-24912][SQL] Don't obscure source of OOM during br...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21899
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21899: [SPARK-24912][SQL] Don't obscure source of OOM during br...

2018-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21899
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94907/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21899: [SPARK-24912][SQL] Don't obscure source of OOM during br...

2018-08-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21899
  
**[Test build #94907 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94907/testReport)**
 for PR 21899 at commit 
[`829a333`](https://github.com/apache/spark/commit/829a333ad3dc152b90e5257cf67e2134c31e839e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >