[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21005 yea, I think so and I just suggested we'd better to file a new jira for that. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Supp...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20937#discussion_r180016246 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -361,6 +361,15 @@ class JacksonParser( // For such records, all fields other than the field configured by // `columnNameOfCorruptRecord` are set to `null`. throw BadRecordException(() => recordLiteral(record), () => None, e) + case e: CharConversionException if options.encoding.isEmpty => +val msg = + """Failed to parse a character. Encoding was detected automatically. --- End diff -- > I don't think `Encoding was detected automatically` is not quite correct. It is absolutely correct. If `encoding` is not set, it is detected automatically by jackson. Look at the condition `if options.encoding.isEmpty =>`. > It might not help user solve the issue but it gives less correct information. It gives absolutely correct information. > They could thought it detects encoding correctly regardless of multiline option. The message DOESN'T say that `encoding` detected correctly. > Think about this scenario: users somehow get this exception and read Failed to parse a character. Encoding was detected automatically.. What would they think? They will look at the proposed solution `You might want to set it explicitly via the encoding option like` and will set `encoding` > I would think somehow the file is somehow failed to read It could be true even `encoding` is set correctly > but it looks detecting the encoding in the file correctly automatically I don't know why you decided that. I see nothing about `encoding` correctness in the message. > It's annoying to debug encoding related stuff in my experience. It would be nicer if we give the correct information as much as we can. What is your suggestion for the error message? > I am saying let's document the automatic encoding detection feature only for multiLine officially, which is true. I agree let's document that thought it is not related to this PR. This PR doesn't change behavior of encoding auto detection. And it must not change the behavior from my point of view. If you want to restrict the encoding auto-detection mechanism somehow, please, create separate PR. We will discuss separately what kind of customer's apps it will break. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Supp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20937#discussion_r180014636 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala --- @@ -86,14 +85,34 @@ private[sql] class JSONOptions( val multiLine = parameters.get("multiLine").map(_.toBoolean).getOrElse(false) - val lineSeparator: Option[String] = parameters.get("lineSep").map { sep => -require(sep.nonEmpty, "'lineSep' cannot be an empty string.") -sep + /** + * A string between two consecutive JSON records. + */ + val lineSeparator: Option[String] = parameters.get("lineSep") + + /** + * Standard encoding (charset) name. For example UTF-8, UTF-16LE and UTF-32BE. + * If the encoding is not specified (None), it will be detected automatically. + */ + val encoding: Option[String] = parameters.get("encoding") +.orElse(parameters.get("charset")).map { enc => + val blacklist = List("UTF16", "UTF32") --- End diff -- Not important but it's more usual and was thinking of doing it if there isn't specific reason to make an exception from a norm. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Supp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20937#discussion_r180014167 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -366,6 +366,9 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { * `java.text.SimpleDateFormat`. This applies to timestamp type. * `multiLine` (default `false`): parse one record, which may span multiple lines, * per file + * `encoding` (by default it is not set): allows to forcibly set one of standard basic + * or extended charsets for input jsons. For example UTF-8, UTF-16BE, UTF-32. If the encoding + * is not specified (by default), it will be detected automatically. --- End diff -- > If encoding is not set, it will be detected by Jackson independently from multiline. Jackson detects but Spark doesn't correctly when `multiLine` is disabled even with this PR, as we talked. We found many holes. Why did you bring this again? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Supp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20937#discussion_r180013348 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala --- @@ -92,26 +93,30 @@ object TextInputJsonDataSource extends JsonDataSource { sparkSession: SparkSession, inputPaths: Seq[FileStatus], parsedOptions: JSONOptions): StructType = { -val json: Dataset[String] = createBaseDataset( - sparkSession, inputPaths, parsedOptions.lineSeparator) +val json: Dataset[String] = createBaseDataset(sparkSession, inputPaths, parsedOptions) + inferFromDataset(json, parsedOptions) } def inferFromDataset(json: Dataset[String], parsedOptions: JSONOptions): StructType = { val sampled: Dataset[String] = JsonUtils.sample(json, parsedOptions) -val rdd: RDD[UTF8String] = sampled.queryExecution.toRdd.map(_.getUTF8String(0)) -JsonInferSchema.infer(rdd, parsedOptions, CreateJacksonParser.utf8String) +val rdd: RDD[InternalRow] = sampled.queryExecution.toRdd +val rowParser = parsedOptions.encoding.map { enc => + CreateJacksonParser.internalRow(enc, _: JsonFactory, _: InternalRow, 0) --- End diff -- Can we do something like ```scala (factory JsonFactory, row: InternalRow) => val bais = new ByteArrayInputStream(row.getBinary(0))) CreateJacksonParser.inputStream(enc, factory, bais) ``` ? Looks `internalRow` doesn't actually deduplicate codes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/21005 @maropu it seems a bit of overkill to add a separate trait for this, it also kinda nullifies the effect of this PR. As for the `CalendarInterval`'s support for `divide` and `multiply`. These operations have not been implemented yet, and - correct me if I am wrong - involve a `CalendarInterval` on the left side and an `Integral` on the right side; this violates the contract of `BinaryArithmetic`. Anyway I am not opposed to this, but I think we should do this as a part of a separate JIRA/PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21005 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Supp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20937#discussion_r180009422 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -361,6 +361,15 @@ class JacksonParser( // For such records, all fields other than the field configured by // `columnNameOfCorruptRecord` are set to `null`. throw BadRecordException(() => recordLiteral(record), () => None, e) + case e: CharConversionException if options.encoding.isEmpty => +val msg = + """Failed to parse a character. Encoding was detected automatically. --- End diff -- I am saying let's document the automatic encoding detection feature only for `multiLine` officially, which is true. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21005 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2088/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Supp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20937#discussion_r180009312 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -361,6 +361,15 @@ class JacksonParser( // For such records, all fields other than the field configured by // `columnNameOfCorruptRecord` are set to `null`. throw BadRecordException(() => recordLiteral(record), () => None, e) + case e: CharConversionException if options.encoding.isEmpty => +val msg = + """Failed to parse a character. Encoding was detected automatically. --- End diff -- I don't think `Encoding was detected automatically` is not quite correct. It might not help user solve the issue but it gives less correct information. They could thought it detects encoding correctly regardless of `multiline` option. Think about this scenario: users somehow get this exception and read `Failed to parse a character. Encoding was detected automatically.`. What would they think? I would think somehow the file is somehow failed to read but it looks detecting the encoding in the file correctly automatically regardless of other options. It's annoying to debug encoding related stuff in my experience. It would be nicer if we give the correct information as much as we can. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20981#discussion_r180008583 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala --- @@ -119,4 +119,25 @@ object InternalRow { case v: MapData => v.copy() case _ => value } + + /** + * Returns an accessor for an InternalRow with given data type and ordinal. + */ + def getAccessor(dataType: DataType, ordinal: Int): (InternalRow) => Any = dataType match { +case BooleanType => (input) => input.getBoolean(ordinal) +case ByteType => (input) => input.getByte(ordinal) +case ShortType => (input) => input.getShort(ordinal) +case IntegerType | DateType => (input) => input.getInt(ordinal) +case LongType | TimestampType => (input) => input.getLong(ordinal) +case FloatType => (input) => input.getFloat(ordinal) +case DoubleType => (input) => input.getDouble(ordinal) +case StringType => (input) => input.getUTF8String(ordinal) +case BinaryType => (input) => input.getBinary(ordinal) +case CalendarIntervalType => (input) => input.getInterval(ordinal) +case t: DecimalType => (input) => input.getDecimal(ordinal, t.precision, t.scale) +case t: StructType => (input) => input.getStruct(ordinal, t.size) +case _: ArrayType => (input) => input.getArray(ordinal) +case _: MapType => (input) => input.getMap(ordinal) +case _ => (input) => input.get(ordinal, dataType) --- End diff -- Handle `UDT`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20981 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21005 **[Test build #89047 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89047/testReport)** for PR 21005 at commit [`433`](https://github.com/apache/spark/commit/43314b1d443fac5ca27ecef80677dbe70ab7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20981#discussion_r180008527 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala --- @@ -33,28 +33,14 @@ case class BoundReference(ordinal: Int, dataType: DataType, nullable: Boolean) override def toString: String = s"input[$ordinal, ${dataType.simpleString}, $nullable]" + private lazy val accessor: InternalRow => Any = InternalRow.getAccessor(dataType, ordinal) --- End diff -- Do we need to be lazy? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20981 **[Test build #89048 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89048/testReport)** for PR 20981 at commit [`2eb2bf1`](https://github.com/apache/spark/commit/2eb2bf1853a0ba4de8f4a3adfe8407d04a075b22). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20981 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2087/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20981 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20944 **[Test build #89046 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89046/testReport)** for PR 20944 at commit [`1c801f1`](https://github.com/apache/spark/commit/1c801f1e673b3d6f9e94eeade08d5b309a105061). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20944 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20944 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2086/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20944 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20981 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89042/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20981 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89040/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20981 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff test Pyth...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20904 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89039/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20981 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff test Pyth...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20904 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20944 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20981 **[Test build #89040 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89040/testReport)** for PR 20981 at commit [`a8cdbe8`](https://github.com/apache/spark/commit/a8cdbe8baf2d508fb2583862042f1213cf0eae7b). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff test Pyth...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20904 **[Test build #89039 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89039/testReport)** for PR 20904 at commit [`49a7ddb`](https://github.com/apache/spark/commit/49a7ddb45cb9a0035e3faed5906ecd37890333e1). * This patch **fails due to an unknown error code, -9**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21004 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21004 **[Test build #89044 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89044/testReport)** for PR 21004 at commit [`10536a6`](https://github.com/apache/spark/commit/10536a6dbf2ab37d7066915223a64e914cf53b5f). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20937 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20937 **[Test build #89045 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89045/testReport)** for PR 20937 at commit [`b817184`](https://github.com/apache/spark/commit/b817184d35d0e2589682f1dcd88b9f29b2063f5b). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20937 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89045/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20981 **[Test build #89042 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89042/testReport)** for PR 20981 at commit [`2eb2bf1`](https://github.com/apache/spark/commit/2eb2bf1853a0ba4de8f4a3adfe8407d04a075b22). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21004 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89044/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20944 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89043/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20944 **[Test build #89043 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89043/testReport)** for PR 20944 at commit [`1c801f1`](https://github.com/apache/spark/commit/1c801f1e673b3d6f9e94eeade08d5b309a105061). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21004 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2085/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21004 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20937 **[Test build #89045 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89045/testReport)** for PR 20937 at commit [`b817184`](https://github.com/apache/spark/commit/b817184d35d0e2589682f1dcd88b9f29b2063f5b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21004 **[Test build #89044 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89044/testReport)** for PR 21004 at commit [`10536a6`](https://github.com/apache/spark/commit/10536a6dbf2ab37d7066915223a64e914cf53b5f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/20937 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Supp...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20937#discussion_r18138 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -361,6 +361,15 @@ class JacksonParser( // For such records, all fields other than the field configured by // `columnNameOfCorruptRecord` are set to `null`. throw BadRecordException(() => recordLiteral(record), () => None, e) + case e: CharConversionException if options.encoding.isEmpty => +val msg = + """Failed to parse a character. Encoding was detected automatically. --- End diff -- ok, speaking about this concrete exception handling. The exception with the message is thrown ONLY when options.encoding.isEmpty is `true`. It means `encoding` is not set and actual encoding of a file was autodetected. The `msg` says about that actually: `Encoding was detected automatically`. Maybe `encoding` was detected correctly but the file contains a wrong char. In that case, the first sentence says this `Failed to parse a character`. The same could happen if you set `encoding` explicitly because you cannot guarantee that inputs are alway correct. > I think automatic detection is true only when multuline is enabled. Wrong char in input file can be in a file with UTF-8 read with `multiline = false` and in a file in UTF-16LE with `multiline = true`. My point is the mention of the `multiline` option in the error message doesn't help to user to solve the issue. A possible solution is to set `encoding` explicitly - what the message says actually. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20944 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20944 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2084/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20944 **[Test build #89043 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89043/testReport)** for PR 20944 at commit [`1c801f1`](https://github.com/apache/spark/commit/1c801f1e673b3d6f9e94eeade08d5b309a105061). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org