[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCode...

2018-09-19 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22358#discussion_r218916987 --- Diff: docs/sql-programming-guide.md --- @@ -965,6 +965,8 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession

[GitHub] spark pull request #13206: [SPARK-15420] [SQL] Add repartition and sort to p...

2018-09-19 Thread rdblue
Github user rdblue closed the pull request at: https://github.com/apache/spark/pull/13206 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22413: [SPARK-25425][SQL] Extra options should override session...

2018-09-19 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22413 +1, thanks for fixing this, @dongjoon-hyun! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22388: Revert [SPARK-24882][SQL] improve data source v2 API fro...

2018-09-17 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22388 Thanks for doing this, @cloud-fan! Sorry I'm late to reply, I was at Strata all last week. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #21308: [SPARK-24253][SQL] Add DeleteSupport mix-in for DataSour...

2018-09-10 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21308 @tigerquoll, I'm talking about the DataSourceV2 API in general. I'm not sure if I think there is value in exposing partitions, but I'd be happy to hear why you think they are valuable and think

[GitHub] spark pull request #21308: [SPARK-24253][SQL] Add DeleteSupport mix-in for D...

2018-09-10 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21308#discussion_r216382704 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/DeleteSupport.java --- @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #21308: [SPARK-24253][SQL] Add DeleteSupport mix-in for DataSour...

2018-09-07 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21308 @tigerquoll, I'm not debating whether we should or shouldn't expose partitions here. In general, I'm undecided. I don't think that the API proposed here needs to support a first-class partition

[GitHub] spark issue #21308: [SPARK-24253][SQL] Add DeleteSupport mix-in for DataSour...

2018-09-06 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21308 @tigerquoll, there is currently no support to expose partitions through the v2 API. That would be a different operation. If you wanted to implement partition operations through this API, then you

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-09-06 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21306 @tigerquoll, the proposal isn't to make partitions part of table configuration. It is to make the partitioning scheme part of the table configuration. How sources choose to handle individual

[GitHub] spark issue #21308: [SPARK-24253][SQL] Add DeleteSupport mix-in for DataSour...

2018-09-04 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21308 @tigerquoll, what we come up with needs to work across a variety of data sources, including those like JDBC that can delete at a lower granularity than partition. For Hive tables

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-09-04 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21306 > Can we support column range partition predicates please? This has an "apply" transform for passing other functions directly through, so that may help if you have addition

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-09-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r214978286 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/TableChange.java --- @@ -0,0 +1,182 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-09-04 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r214977998 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/TableChange.java --- @@ -0,0 +1,182 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-09-03 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22255 @npoberezkin, Parquet already supports custom key-value metadata in the file footer. The Spark version would go

[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-08-31 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22298 Looks fine to me, but I'm not familiar enough with the K8S code to have much of an opinion. --- - To unsubscribe, e-mail

[GitHub] spark issue #22281: [SPARK-25280][SQL] Add support for USING syntax for Data...

2018-08-30 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22281 Thanks for working on this, @HyukjinKwon. I think it's great that this is getting the conversation started. I agree with @cloud-fan that we should think through how we want v2 to work

[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-08-30 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22255 I don't think this fits the intent of the model name. The model name is intended to encode what the data model was that was written to Parquet. I can write Avro records to a Parquet file

[GitHub] spark issue #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.memory li...

2018-08-29 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21977 @vanzin, thanks for merging! And thanks to everyone for the reviews! --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.me...

2018-08-29 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r213777162 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -62,14 +63,20 @@ private[spark] object PythonEvalType

[GitHub] spark issue #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.memory li...

2018-08-28 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21977 The last couple of commits have failed a test case, but there have been no code changes since a passing test. I think master is just a bit flaky right now and that this PR is fine

[GitHub] spark pull request #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.me...

2018-08-28 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r213407352 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -62,14 +63,20 @@ private[spark] object PythonEvalType

[GitHub] spark issue #22193: [SPARK-25186][SQL] Remove v2 save mode.

2018-08-27 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22193 @HyukjinKwon, those changes probably don't need to be in this PR, but this is just a demonstration that we can remove `SaveMode` without changing test cases. The larger issue is that this doesn't

[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-27 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/2 @xuanyuanking, while this does remove the hack, it doesn't address the underlying problem. The problem is that there is a single RDD, which may contain InternalRow or may contain ColumnarBatch

[GitHub] spark pull request #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.me...

2018-08-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r213122284 --- Diff: docs/configuration.md --- @@ -179,6 +179,15 @@ of the most common options to set are: (e.g. 2g, 8g

[GitHub] spark pull request #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.me...

2018-08-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r213121178 --- Diff: docs/configuration.md --- @@ -179,6 +179,15 @@ of the most common options to set are: (e.g. 2g, 8g

[GitHub] spark pull request #22206: [SPARK-25213][PYTHON] Add project to v2 scans bef...

2018-08-27 Thread rdblue
Github user rdblue closed the pull request at: https://github.com/apache/spark/pull/22206 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22206: [SPARK-25213][PYTHON] Add project to v2 scans before pyt...

2018-08-27 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22206 @HyukjinKwon and @viirya, thank you for looking at this commit, but I like @cloud-fan's approach to fixing this in #22244 better than this work-around. I'm going to close this in favor

[GitHub] spark issue #22244: [WIP][SPARK-24721][SPARK-25213][SQL] extract python UDF ...

2018-08-27 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22244 @cloud-fan, I like this solution better than adding a special case in the v2 conversion to physical plan. This explains why the Python exec nodes weren't already in the tree! I'd much rather commit

[GitHub] spark pull request #22206: [SPARK-25213][PYTHON] Add project to v2 scans bef...

2018-08-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22206#discussion_r213098202 --- Diff: python/pyspark/sql/tests.py --- @@ -6394,6 +6394,17 @@ def test_invalid_args(self): df.withColumn('mean_v', mean_udf(df['v

[GitHub] spark pull request #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.me...

2018-08-27 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r213035238 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -91,6 +91,13 @@ private[spark] class Client

[GitHub] spark pull request #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.me...

2018-08-24 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r212782657 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala --- @@ -161,6 +162,11 @@ abstract class

[GitHub] spark pull request #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.me...

2018-08-24 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r212714476 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -114,6 +114,10 @@ package object config { .checkValue(_ >

[GitHub] spark pull request #22206: SPARK-25213: Add project to v2 scans before pytho...

2018-08-24 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22206#discussion_r212676265 --- Diff: python/pyspark/sql/tests.py --- @@ -6394,6 +6394,17 @@ def test_invalid_args(self): df.withColumn('mean_v', mean_udf(df['v

[GitHub] spark pull request #22206: SPARK-25213: Add project to v2 scans before pytho...

2018-08-23 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22206#discussion_r212489210 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala --- @@ -130,10 +133,22 @@ object

[GitHub] spark issue #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write API.

2018-08-23 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22190 Retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22206: SPARK-25213: Add project to v2 scans before pytho...

2018-08-23 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22206#discussion_r212488266 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala --- @@ -130,10 +133,22 @@ object

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-23 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21977 Looks like tests are passing now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write API.

2018-08-23 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22190 @rxin, @cloud-fan, @jose-torres: this is the update to add `WriteConfig`. There's one failed test that I think is unrelated, so this is ready for you to have a look. This will probably need

[GitHub] spark pull request #22206: SPARK-25213: Add project to v2 scans before pytho...

2018-08-23 Thread rdblue
GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/22206 SPARK-25213: Add project to v2 scans before python filters. ## What changes were proposed in this pull request? The v2 API always adds a projection when converting to physical plan

[GitHub] spark issue #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write API.

2018-08-23 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22190 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-23 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21977 This is close. The Java and Scala tests were passing and I think I fixed the remaining issue for the Python tests. Unfortunately, Scala tests are failing again and I was trying to run tests a couple

[GitHub] spark pull request #22193: [SPARK-25186][SQL] Remove v2 save mode.

2018-08-22 Thread rdblue
GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/22193 [SPARK-25186][SQL] Remove v2 save mode. ## What changes were proposed in this pull request? This removes `SaveMode` from the v2 write API. Overwrite is temporarily implemented by deleting

[GitHub] spark issue #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write API.

2018-08-22 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22190 This is related to #21308, which adds `DeleteSupport`. Both `BatchOverwriteSupport` and `DeleteSupport` use the same input to remove data (`Filter[]`) and can reject deletes that don't align

[GitHub] spark pull request #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write AP...

2018-08-22 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22190#discussion_r212121224 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/MicroBatchWriteSupport.scala --- @@ -18,27 +18,38 @@ package

[GitHub] spark pull request #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write AP...

2018-08-22 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22190#discussion_r212120878 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/BatchPartitionOverwriteSupport.java --- @@ -0,0 +1,44 @@ +/* + * Licensed

[GitHub] spark pull request #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write AP...

2018-08-22 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22190#discussion_r212120411 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/BatchPartitionOverwriteSupport.java --- @@ -0,0 +1,44 @@ +/* + * Licensed

[GitHub] spark pull request #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write AP...

2018-08-22 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22190#discussion_r212119716 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/BatchOverwriteSupport.java --- @@ -0,0 +1,61 @@ +/* + * Licensed

[GitHub] spark pull request #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write AP...

2018-08-22 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22190#discussion_r212118021 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -279,10 +277,7 @@ private[kafka010] class

[GitHub] spark pull request #22190: SPARK-25188: Add WriteConfig to v2 write API.

2018-08-22 Thread rdblue
GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/22190 SPARK-25188: Add WriteConfig to v2 write API. ## What changes were proposed in this pull request? This updates the v2 write path to a similar structure as the v2 read path. Individual

[GitHub] spark issue #22185: [SPARK-25127] DataSourceV2: Remove SupportsPushDownCatal...

2018-08-22 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22185 +1 when tests pass. Convenient that there weren't any internal implementations using this. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-22 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21977 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-21 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21977 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-21 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22009 @cloud-fan, I think that the scan config builder needs to accept the options and that SaveMode needs to be removed before we should merge this PR. I'm fine with following up with the WriteConfig

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-21 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r211763717 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala --- @@ -161,6 +162,11 @@ abstract class

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-21 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r211763465 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -60,14 +61,20 @@ private[spark] object PythonEvalType

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-21 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21977 @BryanCutler, thanks for taking a look at this. Despite the problems you hit when the limit was set too low, I think we do want to use that limit. It was the most reliable one from our

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-21 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r211692277 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/BatchReadSupportProvider.java --- @@ -18,48 +18,44 @@ package

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-21 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21977 Looks like the test problems were caused by accessing the SparkConf through either SparkContext or SparkSession on the executor side. The Scala tests are passing and I've fixed a couple more

[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add catalog registration and t...

2018-08-17 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r211057651 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalog/v2/V1MetadataTable.scala --- @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-08-17 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21306 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-08-15 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21306 @rxin, I've updated this to use a new interface, `PartitionTransform`, instead of `Expression`. This is used to pass well-known transformations when creating tables, like `Filter` is used to pass

[GitHub] spark issue #21308: SPARK-24253: Add DeleteSupport mix-in for DataSourceV2.

2018-08-15 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21308 @rxin, I've updated this API to use `Filter` instead of `Expression`. I'd ideally like to get it in soon if you guys have a chance to review it. It's pretty small. cc @cloud-fan

[GitHub] spark pull request #21308: SPARK-24253: Add DeleteSupport mix-in for DataSou...

2018-08-15 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21308#discussion_r210382412 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/DeleteSupport.java --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #21123: [SPARK-24045][SQL]Create base class for file data source...

2018-08-15 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21123 > We should do things incrementally and always prepare for the worst case. This is why I'm pushing for Append support and adding interfaces to finish the logical plans. Releasing logi

[GitHub] spark issue #21123: [SPARK-24045][SQL]Create base class for file data source...

2018-08-15 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21123 > this mapping is not mentioned in the logical plan standardization design doc and I doubt if it's doable I agree! This is why I propose we add an entirely new API for v2 with cl

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-15 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r210317086 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/BatchEvalPythonExec.scala --- @@ -69,7 +67,7 @@ case class BatchEvalPythonExec(udfs

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-15 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21977 @holdenk, what could be the cause? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-08-14 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21306 @cloud-fan, on the dev thread about 2.4 you talked about getting this PR in. What do we need to do next? I can call a vote on the SPIP if you think that's ready. I just bumped the thread

[GitHub] spark issue #21123: [SPARK-24045][SQL]Create base class for file data source...

2018-08-14 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21123 @gatorsmile, I agree. The new logical plans should clean up cases where behavior is ambiguous, examples of which are pointed out in the background of that SPIP. The problem I'm referring

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-14 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21977 You're right on the line number. Maybe it was that I hadn't done a full rebuild in this branch locally before running the test. I'll look into the other error if that's consistent in the Jenkins

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-14 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21977 @squito, the last Jenkins test had a different error message and two of the tests didn't show the stdout/stderr output. I updated the other tests to show that output, and hopefully we get a run

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-14 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r210024440 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-14 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r210021990 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49 @@ +/* + * Licensed

[GitHub] spark issue #21123: [SPARK-24045][SQL]Create base class for file data source...

2018-08-14 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21123 @HyukjinKwon, I don't think there has been a discussion about how v1 and v2 compatibility will work. Discussion on #22009 brought up one aspect of it: whether v2 sources should be passed a `SaveMode

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-14 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209998505 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/RateControlMicroBatchReadSupport.scala --- @@ -0,0 +1,31

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-14 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209997335 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ReadSupport.java --- @@ -45,9 +45,6 @@ * Note that, this may not be a full

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-14 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209996219 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala --- @@ -76,41 +76,43 @@ object

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-14 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r209992878 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -60,14 +61,20 @@ private[spark] object PythonEvalType

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-13 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r209771361 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala --- @@ -179,17 +185,23 @@ abstract class

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-13 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21977 @squito, I updated `YarnClusterSuite` in d58ad7a to capture the output of the child processes to find out what is causing the test failures. I think it is related to the commit you reviewed

[GitHub] spark issue #21978: SPARK-25006: Add CatalogTableIdentifier.

2018-08-13 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21978 @cloud-fan, on the dev thread about 2.4 you talked about getting this PR in. What do we need to do next? I can call a vote on the SPIP if you think that's ready. I just bumped the thread

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-13 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209726516 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala --- @@ -76,41 +76,43 @@ object

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-13 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209712363 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/BatchWriteSupportProvider.java --- @@ -21,33 +21,39 @@ import

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-13 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r209707560 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala --- @@ -137,13 +135,12 @@ case class

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-13 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r209707290 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -60,14 +61,20 @@ private[spark] object PythonEvalType

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-13 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r209703555 --- Diff: python/pyspark/worker.py --- @@ -259,6 +260,26 @@ def main(infile, outfile): "PYSPARK_DRIVER_PYTHON are corr

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-13 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r209703452 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -91,6 +91,13 @@ private[spark] class Client

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-13 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21977 @gatorsmile, we tried both RLIMIT_HEAP and RLIMIT_RSS but those limits didn't consistently work. --- - To unsubscribe, e-mail

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-13 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r209702097 --- Diff: python/pyspark/worker.py --- @@ -259,6 +260,26 @@ def main(infile, outfile): "PYSPARK_DRIVER_PYTHON are corr

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-13 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209699841 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SessionConfigSupport.java --- @@ -27,10 +27,10 @@ @InterfaceStability.Evolving

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-13 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209698964 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/BatchWriteSupportProvider.java --- @@ -21,33 +21,39 @@ import

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-13 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209697665 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49 @@ +/* + * Licensed

[GitHub] spark issue #22043: [SPARK-24251][SQL] Add analysis tests for AppendData.

2018-08-13 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/22043 Thanks for reviewing, @cloud-fan! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209094259 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209093885 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/MicroBatchReadSupport.java --- @@ -0,0 +1,49 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209044995 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/ContinuousPartitionReaderFactory.java --- @@ -0,0 +1,71

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209042787 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala --- @@ -39,52 +36,43 @@ case class

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209042604 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceRDD.scala --- @@ -51,18 +58,19 @@ class DataSourceRDD[T: ClassTag

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209042348 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceRDD.scala --- @@ -51,18 +58,19 @@ class DataSourceRDD[T: ClassTag

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209042148 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49 @@ +/* + * Licensed

<    1   2   3   4   5   6   7   8   9   10   >