spark git commit: [SPARK-25591][PYSPARK][SQL] Avoid overwriting deserialized accumulator

2018-10-08 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 219922422 -> cb90617f8 [SPARK-25591][PYSPARK][SQL] Avoid overwriting deserialized accumulator ## What changes were proposed in this pull request? If we use accumulators in more than one UDFs, it is possible to overwrite deserialized

spark git commit: [SPARK-25673][BUILD] Remove Travis CI which enables Java lint check

2018-10-08 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.4 c8b94099a -> 4214ddd34 [SPARK-25673][BUILD] Remove Travis CI which enables Java lint check ## What changes were proposed in this pull request? https://github.com/apache/spark/pull/12980 added Travis CI file mainly for linter because

spark git commit: [SPARK-25673][BUILD] Remove Travis CI which enables Java lint check

2018-10-08 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master ebd899b8a -> 219922422 [SPARK-25673][BUILD] Remove Travis CI which enables Java lint check ## What changes were proposed in this pull request? https://github.com/apache/spark/pull/12980 added Travis CI file mainly for linter because we

spark git commit: [SPARK-25591][PYSPARK][SQL] Avoid overwriting deserialized accumulator

2018-10-08 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.4 4214ddd34 -> 692ddb3f9 [SPARK-25591][PYSPARK][SQL] Avoid overwriting deserialized accumulator ## What changes were proposed in this pull request? If we use accumulators in more than one UDFs, it is possible to overwrite deserialized

spark git commit: [SPARK-25677][DOC] spark.io.compression.codec = org.apache.spark.io.ZstdCompressionCodec throwing IllegalArgumentException Exception

2018-10-08 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.4 692ddb3f9 -> 193ce77fc [SPARK-25677][DOC] spark.io.compression.codec = org.apache.spark.io.ZstdCompressionCodec throwing IllegalArgumentException Exception ## What changes were proposed in this pull request? Documentation is updated

spark git commit: [SPARK-25677][DOC] spark.io.compression.codec = org.apache.spark.io.ZstdCompressionCodec throwing IllegalArgumentException Exception

2018-10-08 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master cb90617f8 -> 1a6815cd9 [SPARK-25677][DOC] spark.io.compression.codec = org.apache.spark.io.ZstdCompressionCodec throwing IllegalArgumentException Exception ## What changes were proposed in this pull request? Documentation is updated with

spark git commit: [SPARK-25666][PYTHON] Internally document type conversion between Python data and SQL types in normal UDFs

2018-10-08 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 1a6815cd9 -> a853a8020 [SPARK-25666][PYTHON] Internally document type conversion between Python data and SQL types in normal UDFs ### What changes were proposed in this pull request? We are facing some problems about type conversions

spark git commit: [SPARK-25684][SQL] Organize header related codes in CSV datasource

2018-10-11 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master a00181418 -> 39872af88 [SPARK-25684][SQL] Organize header related codes in CSV datasource ## What changes were proposed in this pull request? 1. Move `CSVDataSource.makeSafeHeader` to `CSVUtils.makeSafeHeader` (as is). - Historically

spark git commit: [SPARK-25372][YARN][K8S][FOLLOW-UP] Deprecate and generalize keytab / principal config

2018-10-14 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 6c3f2c6a6 -> 9426fd0c2 [SPARK-25372][YARN][K8S][FOLLOW-UP] Deprecate and generalize keytab / principal config ## What changes were proposed in this pull request? Update the next version of Spark from 2.5 to 3.0 ## How was this patch

spark git commit: [SPARK-25629][TEST] Reduce ParquetFilterSuite: filter pushdown test time costs in Jenkins

2018-10-15 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master fdaa99897 -> 5c7f6b663 [SPARK-25629][TEST] Reduce ParquetFilterSuite: filter pushdown test time costs in Jenkins ## What changes were proposed in this pull request? Only test these 4 cases is enough:

spark git commit: [SPARK-25736][SQL][TEST] add tests to verify the behavior of multi-column count

2018-10-16 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 5c7f6b663 -> e028fd3ae [SPARK-25736][SQL][TEST] add tests to verify the behavior of multi-column count ## What changes were proposed in this pull request? AFAIK multi-column count is not widely supported by the mainstream

spark git commit: [SPARK-25736][SQL][TEST] add tests to verify the behavior of multi-column count

2018-10-16 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.4 8bc7ab03d -> 77156f8c8 [SPARK-25736][SQL][TEST] add tests to verify the behavior of multi-column count ## What changes were proposed in this pull request? AFAIK multi-column count is not widely supported by the mainstream

spark git commit: [SQL][CATALYST][MINOR] update some error comments

2018-10-16 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master a9f685bb7 -> e9332f600 [SQL][CATALYST][MINOR] update some error comments ## What changes were proposed in this pull request? this PR correct some comment error: 1. change from "as low a possible" to "as low as possible" in

[2/2] spark git commit: [SPARK-25393][SQL] Adding new function from_csv()

2018-10-16 Thread gurwls223
[SPARK-25393][SQL] Adding new function from_csv() ## What changes were proposed in this pull request? The PR adds new function `from_csv()` similar to `from_json()` to parse columns with CSV strings. I added the following methods: ```Scala def from_csv(e: Column, schema: StructType, options:

[1/2] spark git commit: [SPARK-25393][SQL] Adding new function from_csv()

2018-10-16 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 9d4dd7992 -> e9af9460b http://git-wip-us.apache.org/repos/asf/spark/blob/e9af9460/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala --

spark git commit: [SQL][CATALYST][MINOR] update some error comments

2018-10-16 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.4 144cb949d -> 3591bd229 [SQL][CATALYST][MINOR] update some error comments ## What changes were proposed in this pull request? this PR correct some comment error: 1. change from "as low a possible" to "as low as possible" in

spark git commit: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV datasource multiline mode

2018-10-18 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master d0ecff285 -> 1e6c1d8bf [SPARK-25493][SQL] Use auto-detection for CRLF in CSV datasource multiline mode ## What changes were proposed in this pull request? CSVs with windows style crlf ('\r\n') don't work in multiline mode. They work fine

spark git commit: [MINOR][DOC] Spacing items in migration guide for readability and consistency

2018-10-18 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.4 36307b1e4 -> 9ed2e4204 [MINOR][DOC] Spacing items in migration guide for readability and consistency ## What changes were proposed in this pull request? Currently, migration guide has no space between each item which looks too

spark git commit: [MINOR][DOC] Spacing items in migration guide for readability and consistency

2018-10-18 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 1e6c1d8bf -> c8f7691c6 [MINOR][DOC] Spacing items in migration guide for readability and consistency ## What changes were proposed in this pull request? Currently, migration guide has no space between each item which looks too compact

spark git commit: [SPARK-25040][SQL] Empty string for non string types should be disallowed

2018-10-22 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master c391dc65e -> 03e82e368 [SPARK-25040][SQL] Empty string for non string types should be disallowed ## What changes were proposed in this pull request? This takes over original PR at #22019. The original proposal is to have null for float

spark git commit: [SPARK-25785][SQL] Add prettyNames for from_json, to_json, from_csv, and schema_of_json

2018-10-19 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 4acbda4a9 -> 3370865b0 [SPARK-25785][SQL] Add prettyNames for from_json, to_json, from_csv, and schema_of_json ## What changes were proposed in this pull request? This PR adds `prettyNames` for `from_json`, `to_json`, `from_csv`, and

spark git commit: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

2018-10-17 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 7d425b190 -> c3eaee776 [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark Master ## What changes were proposed in this pull request? Previously Pyspark used the private constructor for SparkSession when building that object. This

spark git commit: [SPARK-25579][SQL] Use quoted attribute names if needed in pushed ORC predicates

2018-10-16 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master e028fd3ae -> 2c664edc0 [SPARK-25579][SQL] Use quoted attribute names if needed in pushed ORC predicates ## What changes were proposed in this pull request? This PR aims to fix an ORC performance regression at Spark 2.4.0 RCs from Spark

spark git commit: [SPARK-25579][SQL] Use quoted attribute names if needed in pushed ORC predicates

2018-10-16 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.4 77156f8c8 -> 144cb949d [SPARK-25579][SQL] Use quoted attribute names if needed in pushed ORC predicates ## What changes were proposed in this pull request? This PR aims to fix an ORC performance regression at Spark 2.4.0 RCs from

spark git commit: [MINOR][SQL] Avoid hardcoded configuration keys in SQLConf's `doc`

2018-10-29 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 5e5d886a2 -> 5bd5e1b9c [MINOR][SQL] Avoid hardcoded configuration keys in SQLConf's `doc` ## What changes were proposed in this pull request? This PR proposes to avoid hardcorded configuration keys in SQLConf's `doc. ## How was this

spark git commit: [SPARK-25672][SQL] schema_of_csv() - schema inference from an example

2018-10-31 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master c5ef477d2 -> c9667aff4 [SPARK-25672][SQL] schema_of_csv() - schema inference from an example ## What changes were proposed in this pull request? In the PR, I propose to add new function - *schema_of_csv()* which infers schema of CSV

spark git commit: [SPARK-25886][SQL][MINOR] Improve error message of `FailureSafeParser` and `from_avro` in FAILFAST mode

2018-10-31 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 3c0e9ce94 -> 57eddc718 [SPARK-25886][SQL][MINOR] Improve error message of `FailureSafeParser` and `from_avro` in FAILFAST mode ## What changes were proposed in this pull request? Currently in `FailureSafeParser` and `from_avro`, the

spark git commit: [SPARKR] found some extra whitespace in the R tests

2018-10-30 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master f6ff6329e -> 243ce319a [SPARKR] found some extra whitespace in the R tests ## What changes were proposed in this pull request? during my ubuntu-port testing, i found some extra whitespace that for some reason wasn't getting caught on the

spark git commit: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks to use main method

2018-10-30 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 891032da6 -> f6ff6329e [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks to use main method ## What changes were proposed in this pull request? Refactor JSONBenchmark to use main method use spark-submit: `bin/spark-submit --class

spark git commit: [SPARK-24709][SQL][2.4] use str instead of basestring in isinstance

2018-10-27 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.4 f575616db -> 0f74bac64 [SPARK-24709][SQL][2.4] use str instead of basestring in isinstance ## What changes were proposed in this pull request? after backport https://github.com/apache/spark/pull/22775 to 2.4, the 2.4 sbt Jenkins QA

spark git commit: [SPARK-25638][SQL] Adding new function - to_csv()

2018-11-04 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 1a7abf3f4 -> 39399f40b [SPARK-25638][SQL] Adding new function - to_csv() ## What changes were proposed in this pull request? New functions takes a struct and converts it to a CSV strings using passed CSV options. It accepts the same CSV

spark git commit: [INFRA] Close stale PRs

2018-11-04 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 39399f40b -> 463a67668 [INFRA] Close stale PRs Closes https://github.com/apache/spark/pull/22859 Closes https://github.com/apache/spark/pull/22849 Closes https://github.com/apache/spark/pull/22591 Closes

spark git commit: [SPARK-25819][SQL] Support parse mode option for the function `from_avro`

2018-10-25 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 79f3babcc -> 24e8c27df [SPARK-25819][SQL] Support parse mode option for the function `from_avro` ## What changes were proposed in this pull request? Current the function `from_avro` throws exception on reading corrupt records. In

spark git commit: [SPARK-25763][SQL][PYSPARK][TEST] Use more `@contextmanager` to ensure clean-up each test.

2018-10-18 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 1117fc35f -> e80f18dbd [SPARK-25763][SQL][PYSPARK][TEST] Use more `@contextmanager` to ensure clean-up each test. ## What changes were proposed in this pull request? Currently each test in `SQLTest` in PySpark is not cleaned properly. We

spark git commit: [HOTFIX] Fix PySpark pip packaging tests by non-ascii compatible character

2018-10-20 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 3b4f35f56 -> 5330c192b [HOTFIX] Fix PySpark pip packaging tests by non-ascii compatible character ## What changes were proposed in this pull request? PIP installation requires to package bin scripts together.

spark git commit: [SPARK-25950][SQL] from_csv should respect to spark.sql.columnNameOfCorruptRecord

2018-11-06 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 63ca4bbe7 -> 76813cfa1 [SPARK-25950][SQL] from_csv should respect to spark.sql.columnNameOfCorruptRecord ## What changes were proposed in this pull request? Fix for `CsvToStructs` to take into account SQL config

spark git commit: [SPARK-25962][BUILD][PYTHON] Specify minimum versions for both pydocstyle and flake8 in 'lint-python' script

2018-11-07 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master e4561e1c5 -> a8e1c9815 [SPARK-25962][BUILD][PYTHON] Specify minimum versions for both pydocstyle and flake8 in 'lint-python' script ## What changes were proposed in this pull request? This PR explicitly specifies `flake8` and

spark git commit: [SPARK-25955][TEST] Porting JSON tests for CSV functions

2018-11-07 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 17449a2e6 -> ee03f760b [SPARK-25955][TEST] Porting JSON tests for CSV functions ## What changes were proposed in this pull request? In the PR, I propose to port existing JSON tests from `JsonFunctionsSuite` that are applicable for CSV,

spark git commit: [SPARK-25952][SQL] Passing actual schema to JacksonParser

2018-11-07 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master d68f3a726 -> 17449a2e6 [SPARK-25952][SQL] Passing actual schema to JacksonParser ## What changes were proposed in this pull request? The PR fixes an issue when the corrupt record column specified via `spark.sql.columnNameOfCorruptRecord`

spark git commit: Revert "[SPARK-23831][SQL] Add org.apache.derby to IsolatedClientLoader"

2018-11-08 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master ee03f760b -> 0a2e45fdb Revert "[SPARK-23831][SQL] Add org.apache.derby to IsolatedClientLoader" This reverts commit a75571b46f813005a6d4b076ec39081ffab11844. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: Revert "[SPARK-23831][SQL] Add org.apache.derby to IsolatedClientLoader"

2018-11-08 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.4 4c91b224a -> 947462f5a Revert "[SPARK-23831][SQL] Add org.apache.derby to IsolatedClientLoader" This reverts commit a75571b46f813005a6d4b076ec39081ffab11844. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-25510][SQL][TEST][FOLLOW-UP] Remove BenchmarkWithCodegen

2018-11-08 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 79551f558 -> 0558d021c [SPARK-25510][SQL][TEST][FOLLOW-UP] Remove BenchmarkWithCodegen ## What changes were proposed in this pull request? Remove `BenchmarkWithCodegen` as we don't use it anymore. More details:

spark git commit: [SPARK-25945][SQL] Support locale while parsing date/timestamp from CSV/JSON

2018-11-08 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 973f7c01d -> 79551f558 [SPARK-25945][SQL] Support locale while parsing date/timestamp from CSV/JSON ## What changes were proposed in this pull request? In the PR, I propose to add new option `locale` into CSVOptions/JSONOptions to make

spark git commit: [INFRA] Close stale PRs

2018-11-10 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 6cd23482d -> a3ba3a899 [INFRA] Close stale PRs Closes https://github.com/apache/spark/pull/21766 Closes https://github.com/apache/spark/pull/21679 Closes https://github.com/apache/spark/pull/21161 Closes

spark git commit: [SPARK-25972][PYTHON] Missed JSON options in streaming.py

2018-11-11 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master a3ba3a899 -> aec0af4a9 [SPARK-25972][PYTHON] Missed JSON options in streaming.py ## What changes were proposed in this pull request? Added JSON options for `json()` in streaming.py that are presented in the similar method in

spark git commit: [SPARK-26007][SQL] DataFrameReader.csv() respects to spark.sql.columnNameOfCorruptRecord

2018-11-12 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 88c826272 -> c49193437 [SPARK-26007][SQL] DataFrameReader.csv() respects to spark.sql.columnNameOfCorruptRecord ## What changes were proposed in this pull request? Passing current value of SQL config `spark.sql.columnNameOfCorruptRecord`

[2/2] spark git commit: [SPARK-26035][PYTHON] Break large streaming/tests.py files into smaller files

2018-11-15 Thread gurwls223
[SPARK-26035][PYTHON] Break large streaming/tests.py files into smaller files ## What changes were proposed in this pull request? This PR continues to break down a big large file into smaller files. See https://github.com/apache/spark/pull/23021. It targets to follow

[1/2] spark git commit: [SPARK-26035][PYTHON] Break large streaming/tests.py files into smaller files

2018-11-15 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 9a5fda60e -> 3649fe599 http://git-wip-us.apache.org/repos/asf/spark/blob/3649fe59/python/pyspark/streaming/tests/test_listener.py -- diff --git

spark git commit: [SPARK-25883][BACKPORT][SQL][MINOR] Override method `prettyName` in `from_avro`/`to_avro`

2018-11-15 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.4 96834fb77 -> 6148a77a5 [SPARK-25883][BACKPORT][SQL][MINOR] Override method `prettyName` in `from_avro`/`to_avro` Back port https://github.com/apache/spark/pull/22890 to branch-2.4. It is a bug fix for this issue:

spark git commit: [SPARK-25906][SHELL] Documents '-I' option (from Scala REPL) in spark-shell

2018-11-05 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 78fa1be29 -> cc38abc27 [SPARK-25906][SHELL] Documents '-I' option (from Scala REPL) in spark-shell ## What changes were proposed in this pull request? This PR targets to document `-I` option from Spark 2.4.x (previously `-i` option until

spark git commit: [SPARK-25906][SHELL] Documents '-I' option (from Scala REPL) in spark-shell

2018-11-05 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.4 8526f2ee5 -> f98c0ad02 [SPARK-25906][SHELL] Documents '-I' option (from Scala REPL) in spark-shell ## What changes were proposed in this pull request? This PR targets to document `-I` option from Spark 2.4.x (previously `-i` option

[5/7] spark git commit: [SPARK-26032][PYTHON] Break large sql/tests.py files into smaller files

2018-11-13 Thread gurwls223
http://git-wip-us.apache.org/repos/asf/spark/blob/a7a331df/python/pyspark/sql/tests/__init__.py -- diff --git a/python/pyspark/sql/tests/__init__.py b/python/pyspark/sql/tests/__init__.py new file mode 100644 index

[1/7] spark git commit: [SPARK-26032][PYTHON] Break large sql/tests.py files into smaller files

2018-11-13 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master f26cd1881 -> a7a331df6 http://git-wip-us.apache.org/repos/asf/spark/blob/a7a331df/python/pyspark/sql/tests/test_udf.py -- diff --git a/python/pyspark/sql/tests/test_udf.py

[6/7] spark git commit: [SPARK-26032][PYTHON] Break large sql/tests.py files into smaller files

2018-11-13 Thread gurwls223
http://git-wip-us.apache.org/repos/asf/spark/blob/a7a331df/python/pyspark/sql/tests.py -- diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py deleted file mode 100644 index ea02691..000 ---

[3/7] spark git commit: [SPARK-26032][PYTHON] Break large sql/tests.py files into smaller files

2018-11-13 Thread gurwls223
http://git-wip-us.apache.org/repos/asf/spark/blob/a7a331df/python/pyspark/sql/tests/test_pandas_udf_grouped_map.py -- diff --git a/python/pyspark/sql/tests/test_pandas_udf_grouped_map.py

[7/7] spark git commit: [SPARK-26032][PYTHON] Break large sql/tests.py files into smaller files

2018-11-13 Thread gurwls223
[SPARK-26032][PYTHON] Break large sql/tests.py files into smaller files ## What changes were proposed in this pull request? This is the official first attempt to break huge single `tests.py` file - I did it locally before few times and gave up for some reasons. Now, currently it really makes

[4/7] spark git commit: [SPARK-26032][PYTHON] Break large sql/tests.py files into smaller files

2018-11-13 Thread gurwls223
http://git-wip-us.apache.org/repos/asf/spark/blob/a7a331df/python/pyspark/sql/tests/test_dataframe.py -- diff --git a/python/pyspark/sql/tests/test_dataframe.py b/python/pyspark/sql/tests/test_dataframe.py new file mode 100644

[2/7] spark git commit: [SPARK-26032][PYTHON] Break large sql/tests.py files into smaller files

2018-11-13 Thread gurwls223
http://git-wip-us.apache.org/repos/asf/spark/blob/a7a331df/python/pyspark/sql/tests/test_session.py -- diff --git a/python/pyspark/sql/tests/test_session.py b/python/pyspark/sql/tests/test_session.py new file mode 100644 index

spark git commit: [MINOR][SQL] Add disable bucketedRead workaround when throw RuntimeException

2018-11-14 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master ad853c567 -> f6255d7b7 [MINOR][SQL] Add disable bucketedRead workaround when throw RuntimeException ## What changes were proposed in this pull request? It will throw `RuntimeException` when read from bucketed table(about 1.7G per bucket

[2/4] spark git commit: [SPARK-26036][PYTHON] Break large tests.py files into smaller files

2018-11-14 Thread gurwls223
http://git-wip-us.apache.org/repos/asf/spark/blob/03306a6d/python/pyspark/tests/__init__.py -- diff --git a/python/pyspark/tests/__init__.py b/python/pyspark/tests/__init__.py new file mode 100644 index 000..12bdf0d ---

[3/4] spark git commit: [SPARK-26036][PYTHON] Break large tests.py files into smaller files

2018-11-14 Thread gurwls223
http://git-wip-us.apache.org/repos/asf/spark/blob/03306a6d/python/pyspark/tests.py -- diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py deleted file mode 100644 index 131c51e..000 --- a/python/pyspark/tests.py +++

[1/4] spark git commit: [SPARK-26036][PYTHON] Break large tests.py files into smaller files

2018-11-14 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master f6255d7b7 -> 03306a6df http://git-wip-us.apache.org/repos/asf/spark/blob/03306a6d/python/pyspark/tests/test_readwrite.py -- diff --git

spark git commit: [SPARK-26014][R] Deprecate R prior to version 3.4 in SparkR

2018-11-15 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 03306a6df -> d4130ec1f [SPARK-26014][R] Deprecate R prior to version 3.4 in SparkR ## What changes were proposed in this pull request? This PR proposes to bump up the minimum versions of R from 3.1 to 3.4. R version. 3.1.x is too old.

spark git commit: [SPARK-26013][R][BUILD] Upgrade R tools version from 3.4.0 to 3.5.1 in AppVeyor build

2018-11-12 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 0ba9715c7 -> f9ff75653 [SPARK-26013][R][BUILD] Upgrade R tools version from 3.4.0 to 3.5.1 in AppVeyor build ## What changes were proposed in this pull request? R tools 3.5.1 is released few months ago. Spark currently uses 3.4.0. We

spark git commit: [SPARK-24601] Update Jackson to 2.9.6

2018-10-05 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 459700727 -> ab1650d29 [SPARK-24601] Update Jackson to 2.9.6 Hi all, Jackson is incompatible with upstream versions, therefore bump the Jackson version to a more recent one. I bumped into some issues with Azure CosmosDB that is using a

spark git commit: [SPARK-25659][PYTHON][TEST] Test type inference specification for createDataFrame in PySpark

2018-10-08 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master f9935a3f8 -> f3fed2823 [SPARK-25659][PYTHON][TEST] Test type inference specification for createDataFrame in PySpark ## What changes were proposed in this pull request? This PR proposes to specify type inference and simple e2e tests.

spark git commit: [SPARK-25669][SQL] Check CSV header only when it exists

2018-10-09 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.4 4baa4d42a -> 404c84039 [SPARK-25669][SQL] Check CSV header only when it exists ## What changes were proposed in this pull request? Currently the first row of dataset of CSV strings is compared to field names of user specified or

spark git commit: [SPARK-25669][SQL] Check CSV header only when it exists

2018-10-09 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master a4b14a9cf -> 46fe40838 [SPARK-25669][SQL] Check CSV header only when it exists ## What changes were proposed in this pull request? Currently the first row of dataset of CSV strings is compared to field names of user specified or inferred

spark git commit: [SPARK-23401][PYTHON][TESTS] Add more data types for PandasUDFTests

2018-10-01 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.4 82990e5ef -> 426c2bd35 [SPARK-23401][PYTHON][TESTS] Add more data types for PandasUDFTests ## What changes were proposed in this pull request? Add more data types for Pandas UDF Tests for PySpark SQL ## How was this patch tested?

spark git commit: [SPARK-23401][PYTHON][TESTS] Add more data types for PandasUDFTests

2018-10-01 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 21f0b73db -> 30f5d0f2d [SPARK-23401][PYTHON][TESTS] Add more data types for PandasUDFTests ## What changes were proposed in this pull request? Add more data types for Pandas UDF Tests for PySpark SQL ## How was this patch tested? manual

spark git commit: [SPARK-25048][SQL] Pivoting by multiple columns in Scala/Java

2018-09-29 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master dcb9a97f3 -> 623c2ec4e [SPARK-25048][SQL] Pivoting by multiple columns in Scala/Java ## What changes were proposed in this pull request? In the PR, I propose to extend implementation of existing method: ``` def pivot(pivotColumn: Column,

spark git commit: [SPARK-25262][DOC][FOLLOWUP] Fix link tags in html table

2018-09-29 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.4 ec2c17abf -> a14306b1d [SPARK-25262][DOC][FOLLOWUP] Fix link tags in html table ## What changes were proposed in this pull request? Markdown links are not working inside html table. We should use html link tag. ## How was this patch

spark git commit: [SPARK-25447][SQL] Support JSON options by schema_of_json()

2018-09-29 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 1e437835e -> 1007cae20 [SPARK-25447][SQL] Support JSON options by schema_of_json() ## What changes were proposed in this pull request? In the PR, I propose to extended the `schema_of_json()` function, and accept JSON options since they

spark git commit: [SPARK-25262][DOC][FOLLOWUP] Fix link tags in html table

2018-09-29 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 1007cae20 -> dcb9a97f3 [SPARK-25262][DOC][FOLLOWUP] Fix link tags in html table ## What changes were proposed in this pull request? Markdown links are not working inside html table. We should use html link tag. ## How was this patch

spark git commit: [SPARK-25565][BUILD] Add scalastyle rule to check add Locale.ROOT to .toLowerCase and .toUpperCase for internal calls

2018-09-30 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master b6b8a6632 -> a2f502cf5 [SPARK-25565][BUILD] Add scalastyle rule to check add Locale.ROOT to .toLowerCase and .toUpperCase for internal calls ## What changes were proposed in this pull request? This PR adds a rule to force

spark git commit: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vectorized UDFs for SQL Statement

2018-10-03 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 79dd4c964 -> 927e52793 [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vectorized UDFs for SQL Statement ## What changes were proposed in this pull request? This PR proposes to register Grouped aggregate UDF Vectorized UDFs for SQL

spark git commit: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vectorized UDFs for SQL Statement

2018-10-03 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 075dd620e -> 79dd4c964 [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vectorized UDFs for SQL Statement ## What changes were proposed in this pull request? This PR proposes to register Grouped aggregate UDF Vectorized UDFs for SQL

spark git commit: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vectorized UDFs for SQL Statement

2018-10-03 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.4 443d12dbb -> 0763b758d [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vectorized UDFs for SQL Statement ## What changes were proposed in this pull request? This PR proposes to register Grouped aggregate UDF Vectorized UDFs for

spark git commit: [SPARK-25595] Ignore corrupt Avro files if flag IGNORE_CORRUPT_FILES enabled

2018-10-03 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master d6be46eb9 -> 928d0739c [SPARK-25595] Ignore corrupt Avro files if flag IGNORE_CORRUPT_FILES enabled ## What changes were proposed in this pull request? With flag `IGNORE_CORRUPT_FILES` enabled, schema inference should ignore corrupt Avro

spark git commit: [SPARK-25655][BUILD] Add -Pspark-ganglia-lgpl to the scala style check.

2018-10-06 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 58287a398 -> 44cf800c8 [SPARK-25655][BUILD] Add -Pspark-ganglia-lgpl to the scala style check. ## What changes were proposed in this pull request? Our lint failed due to the following errors: ``` [INFO] ---

spark git commit: [SPARK-25621][SPARK-25622][TEST] Reduce test time of BucketedReadWithHiveSupportSuite

2018-10-06 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master f2f4e7afe -> 1ee472eec [SPARK-25621][SPARK-25622][TEST] Reduce test time of BucketedReadWithHiveSupportSuite ## What changes were proposed in this pull request? By replacing loops with random possible value. - `read partitioning bucketed

spark git commit: [SPARK-25202][SQL] Implements split with limit sql function

2018-10-06 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 44cf800c8 -> 17781d753 [SPARK-25202][SQL] Implements split with limit sql function ## What changes were proposed in this pull request? Adds support for the setting limit in the sql split function ## How was this patch tested? 1. Updated

spark git commit: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findTightestCommonType while inferring CSV schema.

2018-10-06 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 17781d753 -> f2f4e7afe [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findTightestCommonType while inferring CSV schema. ## What changes were proposed in this pull request? Current the CSV's infer schema code inlines

spark git commit: [SPARK-25461][PYSPARK][SQL] Add document for mismatch between return type of Pandas.Series and return type of pandas udf

2018-10-07 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master fba722e31 -> 3eb842969 [SPARK-25461][PYSPARK][SQL] Add document for mismatch between return type of Pandas.Series and return type of pandas udf ## What changes were proposed in this pull request? For Pandas UDFs, we get arrow type from

spark git commit: [SPARK-25262][DOC][FOLLOWUP] Fix missing markup tag

2018-09-28 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 5d726b865 -> e99ba8d7c [SPARK-25262][DOC][FOLLOWUP] Fix missing markup tag ## What changes were proposed in this pull request? This adds a missing end markup tag. This should go `master` branch only. ## How was this patch tested? This

spark git commit: [SPARK-25570][SQL][TEST] Replace 2.3.1 with 2.3.2 in HiveExternalCatalogVersionsSuite

2018-09-28 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master e99ba8d7c -> 1e437835e [SPARK-25570][SQL][TEST] Replace 2.3.1 with 2.3.2 in HiveExternalCatalogVersionsSuite ## What changes were proposed in this pull request? This PR aims to prevent test slowdowns at `HiveExternalCatalogVersionsSuite`

spark git commit: [SPARK-25570][SQL][TEST] Replace 2.3.1 with 2.3.2 in HiveExternalCatalogVersionsSuite

2018-09-28 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.4 7614313c9 -> ec2c17abf [SPARK-25570][SQL][TEST] Replace 2.3.1 with 2.3.2 in HiveExternalCatalogVersionsSuite ## What changes were proposed in this pull request? This PR aims to prevent test slowdowns at

spark git commit: [SPARK-25570][SQL][TEST] Replace 2.3.1 with 2.3.2 in HiveExternalCatalogVersionsSuite

2018-09-28 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.3 f13565b6e -> eb78380c0 [SPARK-25570][SQL][TEST] Replace 2.3.1 with 2.3.2 in HiveExternalCatalogVersionsSuite ## What changes were proposed in this pull request? This PR aims to prevent test slowdowns at

spark git commit: [SPARK-25273][DOC] How to install testthat 1.0.2

2018-08-30 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master e9fce2a4c -> 3c67cb0b5 [SPARK-25273][DOC] How to install testthat 1.0.2 ## What changes were proposed in this pull request? R tests require `testthat` v1.0.2. In the PR, I described how to install the version in the section

spark git commit: [SPARK-25273][DOC] How to install testthat 1.0.2

2018-08-30 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.3 306e881b6 -> b072717b3 [SPARK-25273][DOC] How to install testthat 1.0.2 ## What changes were proposed in this pull request? R tests require `testthat` v1.0.2. In the PR, I described how to install the version in the section

spark git commit: [SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Python 3.6 and Pandas 0.23

2018-09-19 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.4 a9a8d3a4b -> 99ae693b3 [SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Python 3.6 and Pandas 0.23 ## What changes were proposed in this pull request? Fix test that constructs a Pandas DataFrame by specifying the

spark git commit: [SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Python 3.6 and Pandas 0.23

2018-09-19 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 6f681d429 -> 90e3955f3 [SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Python 3.6 and Pandas 0.23 ## What changes were proposed in this pull request? Fix test that constructs a Pandas DataFrame by specifying the column

spark git commit: [SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Python 3.6 and Pandas 0.23

2018-09-19 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.3 7b5da37c0 -> e319a624e [SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Python 3.6 and Pandas 0.23 ## What changes were proposed in this pull request? Fix test that constructs a Pandas DataFrame by specifying the

spark git commit: [MINOR][PYTHON][TEST] Use collect() instead of show() to make the output silent

2018-09-20 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 0e31a6f25 -> 7ff5386ed [MINOR][PYTHON][TEST] Use collect() instead of show() to make the output silent ## What changes were proposed in this pull request? This PR replace an effective `show()` to `collect()` to make the output silent.

spark git commit: [MINOR][PYTHON][TEST] Use collect() instead of show() to make the output silent

2018-09-20 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.4 dfcff3839 -> e07042a35 [MINOR][PYTHON][TEST] Use collect() instead of show() to make the output silent ## What changes were proposed in this pull request? This PR replace an effective `show()` to `collect()` to make the output silent.

spark git commit: [SPARKR] Match pyspark features in SparkR communication protocol

2018-09-24 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.4 c64e7506d -> 36e7c8fcc [SPARKR] Match pyspark features in SparkR communication protocol Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/36e7c8fc Tree:

spark git commit: [SPARKR] Match pyspark features in SparkR communication protocol

2018-09-24 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master c79072aaf -> c3b4a94a9 [SPARKR] Match pyspark features in SparkR communication protocol Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c3b4a94a Tree:

spark git commit: [MINOR][PYSPARK] Always Close the tempFile in _serialize_to_jvm

2018-09-22 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.4 1303eb5c8 -> c64e7506d [MINOR][PYSPARK] Always Close the tempFile in _serialize_to_jvm ## What changes were proposed in this pull request? Always close the tempFile after `serializer.dump_stream(data, tempFile)` in _serialize_to_jvm

spark git commit: [SPARK-25473][PYTHON][SS][TEST] ForeachWriter tests failed on Python 3.6 and macOS High Sierra

2018-09-22 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 0fbba76fa -> a72d118cd [SPARK-25473][PYTHON][SS][TEST] ForeachWriter tests failed on Python 3.6 and macOS High Sierra ## What changes were proposed in this pull request? This PR does not fix the problem itself but just target to add few

spark git commit: [MINOR][PYSPARK] Always Close the tempFile in _serialize_to_jvm

2018-09-22 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 6ca87eb2e -> 0fbba76fa [MINOR][PYSPARK] Always Close the tempFile in _serialize_to_jvm ## What changes were proposed in this pull request? Always close the tempFile after `serializer.dump_stream(data, tempFile)` in _serialize_to_jvm ##

<    1   2   3   4   5   6   7   8   9   10   >