[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22775 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22773: [SPARK-25785][SQL] Add prettyNames for from_json, to_jso...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22773 My impression so far was that we note things at migration notes when they are improvements (not bugs), and non-trivial and related to backward compatibility. Shall we clarify what to document at migration guide? Otherwise we should document everything related with, let's say, external changes, deprecation removal, and many other changes related with column names. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22773: [SPARK-25785][SQL] Add prettyNames for from_json, to_jso...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22773 BTW, it's closer to bug rather then improvement tho. `from_json` should have default name `from_json` rather then `jsontostructs` - end users would have no idea why it's called `jsontostructs`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22773: [SPARK-25785][SQL] Add prettyNames for from_json, to_jso...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22773 That's the exact issue I raised before and we ended up with not keeping the compatibility in column names. @cloud-fan and @hvanhovell. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22795: [SPARK-25798][PYTHON] Internally document type conversio...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22795 Thanks @viirya!! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22816: [SPARK-25822][PySpark]Fix a race condition when r...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22816#discussion_r228012237 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -114,7 +114,7 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( context.addTaskCompletionListener[Unit] { _ => writerThread.shutdownOnTaskCompletion() - if (!reuseWorker || !released.get) { + if (!reuseWorker || released.compareAndSet(false, true)) { --- End diff -- I think `released` variable here indices if `releasePythonWorker` is actually called or not before. Shall we rename and update the comments at `val released`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22816: [SPARK-25822][PySpark]Fix a race condition when r...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22816#discussion_r228012035 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -114,7 +114,7 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( context.addTaskCompletionListener[Unit] { _ => writerThread.shutdownOnTaskCompletion() - if (!reuseWorker || !released.get) { + if (!reuseWorker || released.compareAndSet(false, true)) { --- End diff -- I think `released` variable here indices if `releasePythonWorker` is actually called or not. Do you mind if I ask to elaborate why this `released` it set to `true` here as well? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22237 Thanks all!! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22795: [SPARK-25798][PYTHON] Internally document type conversio...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22795 Thanks, @BryanCutler. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22144 Adding it as a known issue sounds reasonable to me as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22807: [WIP][SPARK-25811][PySpark] Raise a proper error ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22807#discussion_r227784077 --- Diff: python/pyspark/sql/tests.py --- @@ -4961,6 +4961,31 @@ def foofoo(x, y): ).collect ) +def test_pandas_udf_detect_unsafe_type_conversion(self): +from distutils.version import LooseVersion +from pyspark.sql.functions import pandas_udf +import pandas as pd +import numpy as np +import pyarrow as pa + +values = [1.0] * 3 +pdf = pd.DataFrame({'A': values}) +df = self.spark.createDataFrame(pdf).repartition(1) + +@pandas_udf(returnType="int") +def udf(column): +return pd.Series(np.linspace(0, 1, 3)) + +udf_boolean = df.select(['A']).withColumn('udf', udf('A')) + +# Since 0.11.0, PyArrow supports the feature to raise an error for unsafe cast. +if LooseVersion(pa.__version__) >= LooseVersion("0.11.0"): --- End diff -- BTW, let's bump up the minimal required PyArrow and Pandas version up if possible at 3.0 :-) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22730: [SPARK-16775][CORE] Remove deprecated accumulator v1 API...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22730 adding @cloud-fan since accumulator version 2 was added by you. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22795: [SPARK-25798][PYTHON] Internally document type co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22795#discussion_r227634476 --- Diff: python/pyspark/sql/functions.py --- @@ -3023,6 +3023,42 @@ def pandas_udf(f=None, returnType=None, functionType=None): conversion on returned data. The conversion is not guaranteed to be correct and results should be checked for accuracy by users. """ + +# The following table shows most of Pandas data and SQL type conversions in Pandas UDFs that +# are not yet visible to the user. Some of behaviors are buggy and might be changed in the near +# future. The table might have to be eventually documented externally. +# Please see SPARK-25798's PR to see the codes in order to generate the table below. +# +# +-+--+--+---+++++-+-+-++++---+-+-++-+-+-+--+---++ # noqa +# |SQL Type \ Pandas Value(Type)|None(object(NoneType))|True(bool)|1(int8)|1(int16)| 1(int32)| 1(int64)|1(uint8)|1(uint16)|1(uint32)|1(uint64)|1.0(float16)|1.0(float32)|1.0(float64)|1970-01-01 00:00:00(datetime64[ns])|1970-01-01 00:00:00-05:00(datetime64[ns, US/Eastern])|a(object(string))| 1(object(Decimal))|[1 2 3](object(array[int32]))|1.0(float128)|(1+0j)(complex64)|(1+0j)(complex128)|A(category)|1 days 00:00:00(timedelta64[ns])| # noqa +# +-+--+--+---+++++-+-+-++++---+-+-++-+-+-+--+---++ # noqa +# | boolean| None| True| True|True|True|True|True| True| True| True| False| False| False| False|False| X| X|X|False| False| False| X| False| # noqa +# | tinyint| None| 1| 1| 1| 1| 1| X|X| X|X| 1| 1| 1| X|X| X| X|X|X| X| X| 0| X| # noqa +# | smallint| None| 1| 1| 1| 1| 1| 1|X| X|X| 1| 1| 1| X|X| X| X|X|X| X| X| X| X| # noqa +# | int| None| 1| 1| 1| 1| 1| 1|1| X|X| 1| 1| 1| X|X| X| X|X|X| X| X| X| X| # noqa +# | bigint| None| 1| 1| 1| 1| 1| 1|1| 1|X| 1| 1| 1| 0| 18| X| X|X|X| X| X| X| X| # noqa +# |float| None| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0|
[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22237 https://github.com/apache/spark/pull/22237/files#r223707899 makes sense to me. Addressed. LGTM from my side as well --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22144 > According to the policy, we don't have to block the current release because of i @cloud-fan, BTW, would you mind if I ask to share what you read? I want to be aware of the policy as well. Maybe do you mean correctness issue and data lose mentioned at https://spark.apache.org/contributing.html? They can be considered as blockers? Blocker means: pointless to release without this change as the release would be unusable to a large minority of users. Correctness and data loss issues should be considered Blockers. It's difficult to say but at least looks not so few of users. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22514 @cloud-fan, is this a performance regression that affects users that use Hive serde tables as well? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22144 Wait wait .. do we only care about regressions as blockers for the last release (2.3)? I'm asking this because I really don't know. If so, I'm okay. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22237 Ah, yea I have a direct access to this branch. Let me just rebase/address the comment tomorrow. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22144 For instance, https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/sketches-user/GmH4-OlHP9g/MW-J7Hg4BwAJ this discussion thread was started almost one year ago. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22144 It does try to fix a backward compatibility issue. It is found later now but still it is true we found a breaking issue to fix. Broken backward compatibility that potentially affects a set of users (it at least looks blocking an external Hive UDF library repo) sounds like a blocker to me. Some of Hive compatibility issues are reported late because IMHO it's a bit complicated to follow (it requires to take a look for both Hive and Spark at least). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22237 Oh wait you left a sign-off. Let me rebase it within tomorrow - wouldn't be a big job. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22237 If @gengliangwang find some time to work on this, yea please go ahead. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22047#discussion_r227367500 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AnyAgg.scala --- @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns true if at least one value of `expr` is true.") --- End diff -- BTW, don't forget to add `since`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22803: [SPARK-25808][BUILD] Upgrade jsr305 version from 1.3.9 t...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22803 Yea, similar opinion. If it does not fix an actual problem, I wouldn't encourage to fix too .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22144 If we were going for 3.0, then I would definitely leave +1 and I agree that we should rather focus on Spark itself as a higher priority - we should do that when we go 3.0 and rather drop such cases IMHO. However, looks we are going for 2.4 for now. It looks indeed a mistake and bringing a feature back that we had in previous releases, and sounds like the problem is not isolated (let's say, it's not specific to a few set UDAFs only), which, IMHO, looks a proper blocker. How about we bring this back if the changes proposed here are all what we need? If the current change is not enough, we could at least mention this feature was dropped intentionally in migration guide. About testing, this PR described "How was this patch tested?" although it has no test. @pgandhi999, are you able to write a test? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22782: [HOTFIX] Fix PySpark pip packaging tests by non-a...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22782#discussion_r227336895 --- Diff: bin/docker-image-tool.sh --- @@ -79,7 +79,7 @@ function build { fi # Verify that Spark has actually been built/is a runnable distribution - #Â i.e. the Spark JARs that the Docker files will place into the image are present --- End diff -- Therefore, using non-breakable spaces in the codes is obviously not a good practice. Please avoid next time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22782: [HOTFIX] Fix PySpark pip packaging tests by non-a...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22782#discussion_r227336318 --- Diff: bin/docker-image-tool.sh --- @@ -79,7 +79,7 @@ function build { fi # Verify that Spark has actually been built/is a runnable distribution - #Â i.e. the Spark JARs that the Docker files will place into the image are present --- End diff -- For the issue itself, It's related to a historical reason for Python. Python 2 supported `str` type as bytes like string. It looked a mistake that confuses users about the concept between bytes and string, and then Python 3 introduced `str` as unicode strings concepts like other programing languages. `open(...).read()` reads it as `str` (which is bytes) in Python 2 but it's read in unicode strings in Python 3 - where we need an implicit conversion between bytes and strings. Looks it had to be to minimise the breaking changes in users codes. So, bytes to string conversion happened here and unfortunately our Jenkins's system default encoding is set to ascii (even though arguably UTF-8 is common). For non-ascii itself, please see the justification at http://www.scalastyle.org/rules-dev.html in ScalaStyle. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22666 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22775 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22775#discussion_r227251515 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -770,8 +776,17 @@ case class SchemaOfJson( factory } - override def convert(v: UTF8String): UTF8String = { -val dt = Utils.tryWithResource(CreateJacksonParser.utf8String(jsonFactory, v)) { parser => + @transient + private lazy val json = child.eval().asInstanceOf[UTF8String] --- End diff -- Yea, that was my thought, and yea I don't strongly feel we should enforce it too in the future but was thinking we better enforce for 2.4.x (if the RC fails). I was thinking about other possibilities to use this expression with `from_json` (for instance, let's say .. as an aggregation expression or .. automatically somehow collect few examples from a column and return a schema). Actually Max proposed an aggregation expression first and that got rejected by Reynold, and this way was suggested by him. > It's also weird that users are willing to write verbose json literal in from_json(..., schema = schema_of_json(...)) instead of DDL string. Yea, it is, but that's what current codes do (https://github.com/apache/spark/blob/dbf9cabfb80c157cf0e52014b07c4abeb1aa2ff6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L803). Nothing serious but I was just thinking about allowing what we need for now given there have been some discussion about it before. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22775#discussion_r227239707 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -770,8 +776,17 @@ case class SchemaOfJson( factory } - override def convert(v: UTF8String): UTF8String = { -val dt = Utils.tryWithResource(CreateJacksonParser.utf8String(jsonFactory, v)) { parser => + @transient + private lazy val json = child.eval().asInstanceOf[UTF8String] --- End diff -- Also, basically we expect structured data format in SQL path, and different DDL type strings for each record wouldn't be quite useful anyway. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22775#discussion_r227239024 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -770,8 +776,17 @@ case class SchemaOfJson( factory } - override def convert(v: UTF8String): UTF8String = { -val dt = Utils.tryWithResource(CreateJacksonParser.utf8String(jsonFactory, v)) { parser => + @transient + private lazy val json = child.eval().asInstanceOf[UTF8String] --- End diff -- It is possible and that's one case I am trying to disallow here. `from_json` only accepts `schema_of_json` when the input is literals. I can't imagine a proper usecase by DDL formatted strings from a JSON column. The main purpose of this expression was `from_json(..., schema = schema_of_json(...))`. I was trying to allow what I need only. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22666 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22730: [SPARK-16775][CORE] Remove deprecated accumulator v1 API...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22730 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22775 @cloud-fan, mind taking a look please? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22787: [SPARK-25040][SQL] Empty string for non string types sho...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22787 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22795: [SPARK-25798][PYTHON] Internally document type co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22795#discussion_r227199025 --- Diff: python/pyspark/sql/functions.py --- @@ -3023,6 +3023,42 @@ def pandas_udf(f=None, returnType=None, functionType=None): conversion on returned data. The conversion is not guaranteed to be correct and results should be checked for accuracy by users. """ + +# The following table shows most of Pandas data and SQL type conversions in Pandas UDFs that +# are not yet visible to the user. Some of behaviors are buggy and might be changed in the near +# future. The table might have to be eventually documented externally. +# Please see SPARK-25798's PR to see the codes in order to generate the table below. +# +# +-+--+--+---+++++-+-+-++++---+-+-++-+-+-+--+---++ # noqa +# |SQL Type \ Pandas Value(Type)|None(object(NoneType))|True(bool)|1(int8)|1(int16)| 1(int32)| 1(int64)|1(uint8)|1(uint16)|1(uint32)|1(uint64)|1.0(float16)|1.0(float32)|1.0(float64)|1970-01-01 00:00:00(datetime64[ns])|1970-01-01 00:00:00-05:00(datetime64[ns, US/Eastern])|a(object(string))| 1(object(Decimal))|[1 2 3](object(array[int32]))|1.0(float128)|(1+0j)(complex64)|(1+0j)(complex128)|A(category)|1 days 00:00:00(timedelta64[ns])| # noqa +# +-+--+--+---+++++-+-+-++++---+-+-++-+-+-+--+---++ # noqa +# | boolean| None| True| True|True|True|True|True| True| True| True| False| False| False| False|False| X| X|X|False| False| False| X| False| # noqa +# | tinyint| None| 1| 1| 1| 1| 1| X|X| X|X| 1| 1| 1| X|X| X| X|X|X| X| X| 0| X| # noqa +# | smallint| None| 1| 1| 1| 1| 1| 1|X| X|X| 1| 1| 1| X|X| X| X|X|X| X| X| X| X| # noqa +# | int| None| 1| 1| 1| 1| 1| 1|1| X|X| 1| 1| 1| X|X| X| X|X|X| X| X| X| X| # noqa +# | bigint| None| 1| 1| 1| 1| 1| 1|1| 1|X| 1| 1| 1| 0| 18| X| X|X|X| X| X| X| X| # noqa +# |float| None| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0|
[GitHub] spark pull request #22795: [SPARK-25798][PYTHON] Internally document type co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22795#discussion_r227198050 --- Diff: python/pyspark/sql/functions.py --- @@ -3023,6 +3023,42 @@ def pandas_udf(f=None, returnType=None, functionType=None): conversion on returned data. The conversion is not guaranteed to be correct and results should be checked for accuracy by users. """ + +# The following table shows most of Pandas data and SQL type conversions in Pandas UDFs that +# are not yet visible to the user. Some of behaviors are buggy and might be changed in the near +# future. The table might have to be eventually documented externally. +# Please see SPARK-25798's PR to see the codes in order to generate the table below. +# +# +-+--+--+---+++++-+-+-++++---+-+-++-+-+-+--+---++ # noqa +# |SQL Type \ Pandas Value(Type)|None(object(NoneType))|True(bool)|1(int8)|1(int16)| 1(int32)| 1(int64)|1(uint8)|1(uint16)|1(uint32)|1(uint64)|1.0(float16)|1.0(float32)|1.0(float64)|1970-01-01 00:00:00(datetime64[ns])|1970-01-01 00:00:00-05:00(datetime64[ns, US/Eastern])|a(object(string))| 1(object(Decimal))|[1 2 3](object(array[int32]))|1.0(float128)|(1+0j)(complex64)|(1+0j)(complex128)|A(category)|1 days 00:00:00(timedelta64[ns])| # noqa +# +-+--+--+---+++++-+-+-++++---+-+-++-+-+-+--+---++ # noqa +# | boolean| None| True| True|True|True|True|True| True| True| True| False| False| False| False|False| X| X|X|False| False| False| X| False| # noqa --- End diff -- I think it's fine .. many conversions here look buggy, for instance, `A(category)` with `tinyint` becomes `0` or string conversions .. Let's just fix, upgrade arrow and then update this chart later.. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22782: [HOTFIX] Fix PySpark pip packaging tests by non-a...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22782#discussion_r227173103 --- Diff: bin/docker-image-tool.sh --- @@ -79,7 +79,7 @@ function build { fi # Verify that Spark has actually been built/is a runnable distribution - #Â i.e. the Spark JARs that the Docker files will place into the image are present --- End diff -- It was more because non-breakable space was used instead instead of space. It was in utf-8 but non ascii compatible. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22795: [SPARK-25798][PYTHON] Internally document type co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22795#discussion_r227026198 --- Diff: python/pyspark/sql/functions.py --- @@ -3023,6 +3023,42 @@ def pandas_udf(f=None, returnType=None, functionType=None): conversion on returned data. The conversion is not guaranteed to be correct and results should be checked for accuracy by users. """ + +# The following table shows most of Pandas data and SQL type conversions in Pandas UDFs that +# are not yet visible to the user. Some of behaviors are buggy and might be changed in the near +# future. The table might have to be eventually documented externally. +# Please see SPARK-25798's PR to see the codes in order to generate the table below. +# --- End diff -- It's not supposed to be exposed and visible to users yet as I said in the comments. Let's keep it internal for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22787: [SPARK-25040][SQL] Empty string for non string types sho...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22787 It's chaotic ... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22655: [SPARK-25666][PYTHON] Internally document type conversio...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22655 Hey @viirya, I happened to find some times to work on it - I submitted a PR https://github.com/apache/spark/pull/22795. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22795: [SPARK-25798][PYTHON] Internally document type co...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/22795 [SPARK-25798][PYTHON] Internally document type conversion between Pandas data and SQL types in Pandas UDFs ## What changes were proposed in this pull request? We are facing some problems about type conversions between Pandas data and SQL types in Pandas UDFs. It's even difficult to identify the problems (see #20163 and #22610). This PR targets to internally document the type conversion table. Some of them looks buggy and we should fix them. Table can be generated via the codes below: ```python from pyspark.sql.types import * from pyspark.sql.functions import pandas_udf columns = [ ('none', 'object(NoneType)'), ('bool', 'bool'), ('int8', 'int8'), ('int16', 'int16'), ('int32', 'int32'), ('int64', 'int64'), ('uint8', 'uint8'), ('uint16', 'uint16'), ('uint32', 'uint32'), ('uint64', 'uint64'), ('float64', 'float16'), ('float64', 'float32'), ('float64', 'float64'), ('date', 'datetime64[ns]'), ('tz_aware_dates', 'datetime64[ns, US/Eastern]'), ('string', 'object(string)'), ('decimal', 'object(Decimal)'), ('array', 'object(array[int32])'), ('float128', 'float128'), ('complex64', 'complex64'), ('complex128', 'complex128'), ('category', 'category'), ('tdeltas', 'timedelta64[ns]'), ] def create_dataframe(): import pandas as pd import numpy as np import decimal pdf = pd.DataFrame({ 'none': [None, None], 'bool': [True, False], 'int8': np.arange(1, 3).astype('int8'), 'int16': np.arange(1, 3).astype('int16'), 'int32': np.arange(1, 3).astype('int32'), 'int64': np.arange(1, 3).astype('int64'), 'uint8': np.arange(1, 3).astype('uint8'), 'uint16': np.arange(1, 3).astype('uint16'), 'uint32': np.arange(1, 3).astype('uint32'), 'uint64': np.arange(1, 3).astype('uint64'), 'float16': np.arange(1, 3).astype('float16'), 'float32': np.arange(1, 3).astype('float32'), 'float64': np.arange(1, 3).astype('float64'), 'float128': np.arange(1, 3).astype('float128'), 'complex64': np.arange(1, 3).astype('complex64'), 'complex128': np.arange(1, 3).astype('complex128'), 'string': list('ab'), 'array': pd.Series([np.array([1, 2, 3], dtype=np.int32), np.array([1, 2, 3], dtype=np.int32)]), 'decimal': pd.Series([decimal.Decimal('1'), decimal.Decimal('2')]), 'date': pd.date_range('19700101', periods=2).values, 'category': pd.Series(list("AB")).astype('category')}) pdf['tdeltas'] = [pdf.date.diff()[1], pdf.date.diff()[0]] pdf['tz_aware_dates'] = pd.date_range('19700101', periods=2, tz='US/Eastern') return pdf types = [ BooleanType(), ByteType(), ShortType(), IntegerType(), LongType(), FloatType(), DoubleType(), DateType(), TimestampType(), StringType(), DecimalType(10, 0), ArrayType(IntegerType()), MapType(StringType(), IntegerType()), StructType([StructField("_1", IntegerType())]), BinaryType(), ] df = spark.range(2).repartition(1) results = [] count = 0 total = len(types) * len(columns) values = [] spark.sparkContext.setLogLevel("FATAL") for t in types: result = [] for column, pandas_t in columns: v = create_dataframe()[column][0] values.append(v) try: row = df.select(pandas_udf(lambda _: create_dataframe()[column], t)(df.id)).first() ret_str = repr(row[0]) except Exception: ret_str = "X"
[GitHub] spark issue #22795: [SPARK-25798][PYTHON] Internally document type conversio...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22795 cc @viirya, @BryanCutler and @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22787: [SPARK-25040][SQL] Empty string for non string types sho...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22787 github looks buggy for now. Let me clean up my comments if they got messed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22787: [SPARK-25040][SQL] Empty string for non string types sho...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22787 Yea, looks we should better fix the comments. LGTM otherwise. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22787: [SPARK-25040][SQL] Empty string for non string types sho...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22787 Yea looks good as we discussed. Should we maybe better update the migration guide too while we are here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22783: [WIP][BUILD] Fix errors of log4j when pip sanity ...
Github user HyukjinKwon closed the pull request at: https://github.com/apache/spark/pull/22783 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22666 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22783: [WIP][BUILD] Fix errors of log4j when pip sanity ...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/22783 [WIP][BUILD] Fix errors of log4j when pip sanity checking ## What changes were proposed in this pull request? PIP sanity checking produces some errors about log4j. I have some suspension about it - will file a JIRA and fix this PR description when my doubt is clear. ``` Run basic sanity check on pip installed version with spark-submit log4j:ERROR setFile(null,true) call failed. java.io.FileNotFoundException: target/unit-tests.log (No such file or directory) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.(FileOutputStream.java:213) at java.io.FileOutputStream.(FileOutputStream.java:133) at org.apache.log4j.FileAppender.setFile(FileAppender.java:294) at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165) at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104) at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842) at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768) at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580) at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526) at org.apache.log4j.LogManager.(LogManager.java:127) at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:120) at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:108) at org.apache.spark.deploy.SparkSubmit.initializeLogIfNecessary(SparkSubmit.scala:71) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:79) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:927) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:936) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Successfully ran pip sanity check Run basic sanity check with import based log4j:ERROR setFile(null,true) call failed. java.io.FileNotFoundException: target/unit-tests.log (No such file or directory) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.(FileOutputStream.java:213) at java.io.FileOutputStream.(FileOutputStream.java:133) at org.apache.log4j.FileAppender.setFile(FileAppender.java:294) at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165) at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104) at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842) at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768) at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580) at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526) at org.apache.log4j.LogManager.(LogManager.java:127) at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:120) at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:108) at org.apache.spark.deploy.SparkSubmit.initializeLogIfNecessary(SparkSubmit.scala:71) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:79) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:927) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:936) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). [Stage 0:>(0 + 10) / 10] Successfully ran pip sanity check Run
[GitHub] spark issue #22662: [SPARK-25627][TEST] Reduce test time for ContinuousStres...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22662 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22776: [SPARK-25779][SQL][TESTS] Remove SQL query tests for fun...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22776 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22775 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22666 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22782: [HOTFIX] Fix PySpark pip packaging tests by non-ascii co...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22782 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22782: [HOTFIX] Fix PySpark pip packaging tests by non-ascii co...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22782 pip packaging tests got passed. Let me merge this one since it blocks almost every PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22782: [HOTFIX] Fix PySpark pip packaging tests by non-a...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22782#discussion_r226833951 --- Diff: bin/docker-image-tool.sh --- @@ -79,7 +79,7 @@ function build { fi # Verify that Spark has actually been built/is a runnable distribution - #Â i.e. the Spark JARs that the Docker files will place into the image are present --- End diff -- Yea, some text editors insert non-breakable spaces and probably that's why. I dealt with similar problems at md files before. I think we have scalastyle for nonascii but looks not for scripts .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22782: [HOTFIX] Fix PySpark pip packaging tests by non-a...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22782#discussion_r226833113 --- Diff: python/pyspark/__init__.py --- @@ -16,7 +16,7 @@ # """ -PySpark is the Python API for Spark. +PySpark is the Python API for Spark --- End diff -- yup. I will revert this one too. just intended to test Python tests only since Scala tests takes long. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22782: [HOTFIX] Fix PySpark pip packaging tests by non-a...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22782#discussion_r226833051 --- Diff: dev/run-tests.py --- @@ -551,7 +551,8 @@ def main(): if not changed_files or any(f.endswith(".scala") or f.endswith("scalastyle-config.xml") for f in changed_files): -run_scala_style_checks() +# run_scala_style_checks() --- End diff -- tentative workaround. yup yup. I will revert all back once the tests pass --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22501 Yup, I made a fix https://github.com/apache/spark/pull/22782 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22748: [SPARK-25745][K8S] Improve docker-image-tool.sh script
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22748 @vanzin, the test failure was related. don't merge if the tests are failed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22782: [WIP][HOTFIX] PIP failure fix
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/22782 [WIP][HOTFIX] PIP failure fix ## What changes were proposed in this pull request? ## How was this patch tested? Jenkins You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark pip-failure-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22782.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22782 commit 25435dc588f081c83248a3d6c8a69b568c9cb59e Author: hyukjinkwon Date: 2018-10-20T16:32:56Z Fix commit cd5cab6cb981f88456488fde4e177c80ada0a8a8 Author: hyukjinkwon Date: 2018-10-20T16:33:25Z python test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22501 Thanks. It might rather more be related to external factors. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark to use ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22501 I guess it's related with pip packaging tho. ``` Traceback (most recent call last): File "", line 1, in File "/home/jenkins/workspace/SparkPullRequestBuilder/python/setup.py", line 224, in 'Programming Language :: Python :: Implementation :: PyPy'] File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/site-packages/setuptools/__init__.py", line 140, in setup return distutils.core.setup(**attrs) File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/distutils/core.py", line 148, in setup dist.run_commands() File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/distutils/dist.py", line 955, in run_commands self.run_command(cmd) File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/distutils/dist.py", line 974, in run_command cmd_obj.run() File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/site-packages/setuptools/command/develop.py", line 38, in run self.install_for_development() File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/site-packages/setuptools/command/develop.py", line 154, in install_for_development self.process_distribution(None, self.dist, not self.no_deps) File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/site-packages/setuptools/command/easy_install.py", line 729, in process_distribution self.install_egg_scripts(dist) File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/site-packages/setuptools/command/develop.py", line 189, in install_egg_scripts script_text = strm.read() File "/tmp/tmp.R2Y98bevgD/3.5/lib/python3.5/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 2719: ordinal not in range(128) ``` It's from setup.py --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22775 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22776: [SPARK-25779][SQL][TESTS] Remove SQL query tests for fun...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22776 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21157 The workaround is to use CloudPickler btw. Technically we many cases that normal pickler does not support. This one specific case (namedtuple) was allowed by this weird hack. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] zeppelin issue #3206: [WIP][ZEPPELIN-3810] Support Spark 2.4
Github user HyukjinKwon commented on the issue: https://github.com/apache/zeppelin/pull/3206 Nope, it will work for both 2.11.8 and 2.11.12. I manually checked. This change only uses the methods existing in both 2.11.8 and 2.11.12 at Scala. ---
[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22775 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22775 Ah.. let me rebase and sync the tests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22781: [MINOR][DOC] Fix the building document to describ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22781#discussion_r226816661 --- Diff: docs/building-spark.md --- @@ -12,7 +12,7 @@ redirect_from: "building-with-maven.html" ## Apache Maven The Maven-based build is the build of reference for Apache Spark. -Building Spark using Maven requires Maven 3.3.9 or newer and Java 8+. +Building Spark using Maven requires Maven 3.3.9 or newer and Java 8. --- End diff -- +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22666: [SPARK-25672][SQL] schema_of_csv() - schema infer...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22666#discussion_r226814727 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -3886,6 +3886,31 @@ object functions { withExpr(new CsvToStructs(e.expr, schema.expr, options.asScala.toMap)) } + /** + * Parses a column containing a CSV string and infers its schema. + * + * @param e a string column containing CSV data. + * + * @group collection_funcs + * @since 3.0.0 + */ + def schema_of_csv(e: Column): Column = withExpr(new SchemaOfCsv(e.expr)) + + /** + * Parses a column containing a CSV string and infers its schema using options. + * + * @param e a string column containing CSV data. + * @param options options to control how the CSV is parsed. accepts the same options and the + *json data source. See [[DataFrameReader#csv]]. + * @return a column with string literal containing schema in DDL format. + * + * @group collection_funcs + * @since 3.0.0 + */ + def schema_of_csv(e: Column, options: java.util.Map[String, String]): Column = { --- End diff -- `schema_of_json` also has only Java specific (I actually suggested to minimise exposed functions) since Java specific one can be used in Scala side but Scala specific can't be used in Java side. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22773: [SPARK-25785][SQL] Add prettyNames for from_json, to_jso...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22773 Thank you @viirya and @dongjoon-hyun. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22773: [SPARK-25785][SQL] Add prettyNames for from_json, to_jso...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22773 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22775 Yup, will fix. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22773: [MINOR][SQL] Add prettyNames for from_json, to_json, fro...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22773 Other JIRAs have different fixed versions. Let me create a new JIRA then. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22775 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22776: [SPARK-25779][SQL][TESTS] Remove SQL query tests ...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/22776 [SPARK-25779][SQL][TESTS] Remove SQL query tests for function documentation by DESCRIBE FUNCTION at SQLQueryTestSuite Currently, there are some tests testing function descriptions: ```bash $ grep -ir "describe function" sql/core/src/test/resources/sql-tests/inputs sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function to_json; sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function extended to_json; sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function from_json; sql/core/src/test/resources/sql-tests/inputs/json-functions.sql:describe function extended from_json; ``` Looks there are not quite good points about testing them since we're not going to test documentation itself. For `DESCRIBE FCUNTION` functionality itself, they are already being tested here and there. See the test failures in https://github.com/apache/spark/pull/18749 (where I added examples to function descriptions) We better remove those tests so that people don't add such tests in the SQL tests. ## How was this patch tested? Manual. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-25779 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22776.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22776 commit 7f4cd6d00b3bb27f123a5be4a7f5d4979d513c35 Author: hyukjinkwon Date: 2018-10-19T09:28:27Z Remove SQL query tests for function documentation by DESCRIBE FUNCTION at SQLQueryTestSuite --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22666 Should be ready for a look now. Would you mind taking a look please @cloud-fan and @gatorsmile? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's inpu...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22775 This should be targeted to 2.4 .. otherwise we should describe the behaviour change at migration note. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22775: [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/22775 [SPARK-24709][SQL][FOLLOW-UP] Make schema_of_json's input json as literal only ## What changes were proposed in this pull request? The main purpose of `schema_of_json` is the usage of combination with `from_json` (to make up the leak of schema inference) which takes its schema only as literal; however, currently `schema_of_json` allows JSON input as non-literal expressions (e.g, column). This was mistakenly allowed - we don't have to take other usages rather then the main purpose into account for now. This PR makes a followup to only allow literals for `schema_of_json`'s JSON input. We can allow non literal expressions later when it's needed or there are some usecase for it. ## How was this patch tested? Unit tests were added. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-25447-followup Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22775.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22775 commit 2705aa8739ecfc1d2ea8aea40082734084f36514 Author: hyukjinkwon Date: 2018-10-19T06:40:58Z Make schema_of_json's input json as liternal only --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22429 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22773: [MINOR][SQL] Add prettyNames for from_json, to_json, fro...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22773 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22761: [MINOR][DOC] Spacing items in migration guide for readab...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22761 Merged to master and branch-2.4. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22773: [MINOR][SQL] Add prettyNames for from_json, to_js...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/22773 [MINOR][SQL] Add prettyNames for from_json, to_json, from_csv, and schema_of_json ## What changes were proposed in this pull request? This PR adds `prettyNames` for `from_json`, `to_json`, `from_csv`, and `schema_of_json` so that appropriate names are used. ## How was this patch tested? Unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark minor-prettyNames Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22773.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22773 commit 3447e73989e39d0c052cf69e5a3e80d1ebb221dc Author: hyukjinkwon Date: 2018-10-19T05:28:55Z Add prettyNames for from_json, to_json, from_csv, and schema_of_json --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226536773 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -840,12 +840,19 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } test("create table in default db") { -val catalog = spark.sessionState.catalog -val tableIdent1 = TableIdentifier("tab1", None) -createTable(catalog, tableIdent1) -val expectedTableIdent = tableIdent1.copy(database = Some("default")) -val expectedTable = generateTable(catalog, expectedTableIdent) -checkCatalogTables(expectedTable, catalog.getTableMetadata(tableIdent1)) +var tablePath: Option[URI] = None --- End diff -- `var tablePath: URI = null` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226536735 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -207,6 +207,14 @@ class SessionCatalog( "you cannot create a database with this name.") } validateName(dbName) +// SPARK-25464 fail if DB location exists and is not empty +val dbPath = new Path(dbDefinition.locationUri) +val fs = dbPath.getFileSystem(hadoopConf) +if (!externalCatalog.databaseExists(dbName) && fs.exists(dbPath) + && fs.listStatus(dbPath).nonEmpty) { --- End diff -- Should we necessarily list up files? it's potentially expensive. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226536585 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -2370,4 +2370,17 @@ class HiveDDLSuite )) } } + + test("SPARK-25464 create Database with non empty location") { +val dbName = "dbwithcustomlocation" +withTempDir { tmpDir => + val parentDir = tmpDir.getParent + val expectedMsg = s"Cannot create database at location $parentDir because the path is not " + +s"empty." --- End diff -- leading `s` can be removed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226536555 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -2370,4 +2370,17 @@ class HiveDDLSuite )) } } + + test("SPARK-25464 create Database with non empty location") { --- End diff -- `create a database with a non-empty location` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226536442 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -840,12 +840,19 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } test("create table in default db") { -val catalog = spark.sessionState.catalog -val tableIdent1 = TableIdentifier("tab1", None) -createTable(catalog, tableIdent1) -val expectedTableIdent = tableIdent1.copy(database = Some("default")) -val expectedTable = generateTable(catalog, expectedTableIdent) -checkCatalogTables(expectedTable, catalog.getTableMetadata(tableIdent1)) +var tablePath: Option[URI] = None +try { + val catalog = spark.sessionState.catalog + val tableIdent1 = TableIdentifier("tab1", None) + createTable(catalog, tableIdent1) + val expectedTableIdent = tableIdent1.copy(database = Some("default")) + val expectedTable = generateTable(catalog, expectedTableIdent) + tablePath = Some(expectedTable.location) + checkCatalogTables(expectedTable, catalog.getTableMetadata(tableIdent1)) +} finally { + // This is external table,so it is required to deleted the path --- End diff -- `this is an external table` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226536456 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -840,12 +840,19 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } test("create table in default db") { -val catalog = spark.sessionState.catalog -val tableIdent1 = TableIdentifier("tab1", None) -createTable(catalog, tableIdent1) -val expectedTableIdent = tableIdent1.copy(database = Some("default")) -val expectedTable = generateTable(catalog, expectedTableIdent) -checkCatalogTables(expectedTable, catalog.getTableMetadata(tableIdent1)) +var tablePath: Option[URI] = None +try { + val catalog = spark.sessionState.catalog + val tableIdent1 = TableIdentifier("tab1", None) + createTable(catalog, tableIdent1) + val expectedTableIdent = tableIdent1.copy(database = Some("default")) + val expectedTable = generateTable(catalog, expectedTableIdent) + tablePath = Some(expectedTable.location) + checkCatalogTables(expectedTable, catalog.getTableMetadata(tableIdent1)) +} finally { + // This is external table,so it is required to deleted the path --- End diff -- `it is required to delete` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22466: [SPARK-25464][SQL] Create Database to the locatio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r226536304 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -840,12 +840,19 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } test("create table in default db") { -val catalog = spark.sessionState.catalog -val tableIdent1 = TableIdentifier("tab1", None) -createTable(catalog, tableIdent1) -val expectedTableIdent = tableIdent1.copy(database = Some("default")) -val expectedTable = generateTable(catalog, expectedTableIdent) -checkCatalogTables(expectedTable, catalog.getTableMetadata(tableIdent1)) +var tablePath: Option[URI] = None +try { + val catalog = spark.sessionState.catalog + val tableIdent1 = TableIdentifier("tab1", None) + createTable(catalog, tableIdent1) + val expectedTableIdent = tableIdent1.copy(database = Some("default")) + val expectedTable = generateTable(catalog, expectedTableIdent) + tablePath = Some(expectedTable.location) + checkCatalogTables(expectedTable, catalog.getTableMetadata(tableIdent1)) +} finally { + // This is external table,so it is required to deleted the path --- End diff -- tiny nit: `e,` -> `e ,` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22466 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22772: [SPARK-24499][SQL][DOC][Followup] Fix some broken links
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22772 It's okay. the doc fix was huge and there should likely be some mistakes. I will read it closely too this weekends. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22666 This is a WIP. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22429 I am able to address his comments for his vacation. Please keep reviewing this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22429 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in CSV da...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22503 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22503: [SPARK-25493][SQL] Use auto-detection for CRLF in...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22503#discussion_r226524439 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -220,6 +221,17 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils with Te } } + test("crlf line separators in multiline mode") { --- End diff -- nit: -> `SPARK-25493: crlf line separators in multiline mode` when a PR fixes a specific problem, let's add the jira prefix in the test name next time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22576: [SPARK-25560][SQL] Allow FunctionInjection in SparkExten...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22576 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org