[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...

2018-04-09 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21005
  
yea, I think so and I just suggested we'd better to file a new jira for 
that. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Supp...

2018-04-09 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20937#discussion_r180016246
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -361,6 +361,15 @@ class JacksonParser(
 // For such records, all fields other than the field configured by
 // `columnNameOfCorruptRecord` are set to `null`.
 throw BadRecordException(() => recordLiteral(record), () => None, 
e)
+  case e: CharConversionException if options.encoding.isEmpty =>
+val msg =
+  """Failed to parse a character. Encoding was detected 
automatically.
--- End diff --

> I don't think `Encoding was detected automatically` is not quite correct.

It is absolutely correct. If `encoding` is not set, it is detected 
automatically by jackson.  Look at the condition `if options.encoding.isEmpty 
=>`. 

> It might not help user solve the issue but it gives less correct 
information.

It gives absolutely correct information.

> They could thought it detects encoding correctly regardless of multiline 
option.

The message DOESN'T say that `encoding` detected correctly.

> Think about this scenario: users somehow get this exception and read 
Failed to parse a character. Encoding was detected automatically.. What would 
they think?

They will look at the proposed solution `You might want to set it 
explicitly via the encoding option like` and will set `encoding`

> I would think somehow the file is somehow failed to read

It could be true even `encoding` is set correctly

> but it looks detecting the encoding in the file correctly automatically 

I don't know why you decided that. I see nothing about `encoding` 
correctness in the message.

> It's annoying to debug encoding related stuff in my experience. It would 
be nicer if we give the correct information as much as we can.

What is your suggestion for the error message?

> I am saying let's document the automatic encoding detection feature only 
for multiLine officially, which is true.

I agree let's document that thought it is not related to this PR. This PR 
doesn't change behavior of encoding auto detection. And it must not change the 
behavior from my point of view. If you want to restrict the encoding 
auto-detection mechanism somehow, please, create separate PR. We will discuss 
separately what kind of customer's apps it will break. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Supp...

2018-04-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20937#discussion_r180014636
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
 ---
@@ -86,14 +85,34 @@ private[sql] class JSONOptions(
 
   val multiLine = 
parameters.get("multiLine").map(_.toBoolean).getOrElse(false)
 
-  val lineSeparator: Option[String] = parameters.get("lineSep").map { sep 
=>
-require(sep.nonEmpty, "'lineSep' cannot be an empty string.")
-sep
+  /**
+   * A string between two consecutive JSON records.
+   */
+  val lineSeparator: Option[String] = parameters.get("lineSep")
+
+  /**
+   * Standard encoding (charset) name. For example UTF-8, UTF-16LE and 
UTF-32BE.
+   * If the encoding is not specified (None), it will be detected 
automatically.
+   */
+  val encoding: Option[String] = parameters.get("encoding")
+.orElse(parameters.get("charset")).map { enc =>
+  val blacklist = List("UTF16", "UTF32")
--- End diff --

Not important but it's more usual and was thinking of doing it if there 
isn't specific reason to make an exception from a norm.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Supp...

2018-04-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20937#discussion_r180014167
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -366,6 +366,9 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
* `java.text.SimpleDateFormat`. This applies to timestamp type.
* `multiLine` (default `false`): parse one record, which may span 
multiple lines,
* per file
+   * `encoding` (by default it is not set): allows to forcibly set one 
of standard basic
+   * or extended charsets for input jsons. For example UTF-8, UTF-16BE, 
UTF-32. If the encoding
+   * is not specified (by default), it will be detected automatically.
--- End diff --

> If encoding is not set, it will be detected by Jackson independently from 
multiline.

Jackson detects but Spark doesn't correctly when `multiLine` is disabled 
even with this PR, as we talked. We found many holes. Why did you bring this 
again?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Supp...

2018-04-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20937#discussion_r180013348
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala
 ---
@@ -92,26 +93,30 @@ object TextInputJsonDataSource extends JsonDataSource {
   sparkSession: SparkSession,
   inputPaths: Seq[FileStatus],
   parsedOptions: JSONOptions): StructType = {
-val json: Dataset[String] = createBaseDataset(
-  sparkSession, inputPaths, parsedOptions.lineSeparator)
+val json: Dataset[String] = createBaseDataset(sparkSession, 
inputPaths, parsedOptions)
+
 inferFromDataset(json, parsedOptions)
   }
 
   def inferFromDataset(json: Dataset[String], parsedOptions: JSONOptions): 
StructType = {
 val sampled: Dataset[String] = JsonUtils.sample(json, parsedOptions)
-val rdd: RDD[UTF8String] = 
sampled.queryExecution.toRdd.map(_.getUTF8String(0))
-JsonInferSchema.infer(rdd, parsedOptions, 
CreateJacksonParser.utf8String)
+val rdd: RDD[InternalRow] = sampled.queryExecution.toRdd
+val rowParser = parsedOptions.encoding.map { enc =>
+  CreateJacksonParser.internalRow(enc, _: JsonFactory, _: InternalRow, 
0)
--- End diff --

Can we do something like

```scala
(factory JsonFactory, row: InternalRow) =>
  val bais = new ByteArrayInputStream(row.getBinary(0)))
  CreateJacksonParser.inputStream(enc, factory, bais)
```
?

Looks `internalRow` doesn't actually deduplicate codes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...

2018-04-09 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/21005
  
@maropu it seems a bit of overkill to add a separate trait for this, it 
also kinda nullifies the effect of this PR.

As for the `CalendarInterval`'s support for `divide` and `multiply`. These 
operations have not been implemented yet, and - correct me if I am wrong - 
involve a `CalendarInterval` on the left side and an `Integral` on the right 
side; this violates the contract of `BinaryArithmetic`. Anyway I am not opposed 
to this, but I think we should do this as a part of a separate JIRA/PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21005
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Supp...

2018-04-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20937#discussion_r180009422
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -361,6 +361,15 @@ class JacksonParser(
 // For such records, all fields other than the field configured by
 // `columnNameOfCorruptRecord` are set to `null`.
 throw BadRecordException(() => recordLiteral(record), () => None, 
e)
+  case e: CharConversionException if options.encoding.isEmpty =>
+val msg =
+  """Failed to parse a character. Encoding was detected 
automatically.
--- End diff --

I am saying let's document the automatic encoding detection feature only 
for `multiLine` officially, which is true.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21005
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2088/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Supp...

2018-04-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20937#discussion_r180009312
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -361,6 +361,15 @@ class JacksonParser(
 // For such records, all fields other than the field configured by
 // `columnNameOfCorruptRecord` are set to `null`.
 throw BadRecordException(() => recordLiteral(record), () => None, 
e)
+  case e: CharConversionException if options.encoding.isEmpty =>
+val msg =
+  """Failed to parse a character. Encoding was detected 
automatically.
--- End diff --

I don't think `Encoding was detected automatically` is not quite correct. 
It might not help user solve the issue but it gives less correct information. 
They could thought it detects encoding correctly regardless of `multiline` 
option.

Think about this scenario: users somehow get this exception and read  
`Failed to parse a character. Encoding was detected automatically.`. What would 
they think? I would think somehow the file is somehow failed to read but it 
looks detecting the encoding in the file correctly automatically regardless of 
other options.

It's annoying to debug encoding related stuff in my experience. It would be 
nicer if we give the correct information as much as we can.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...

2018-04-09 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20981#discussion_r180008583
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala ---
@@ -119,4 +119,25 @@ object InternalRow {
 case v: MapData => v.copy()
 case _ => value
   }
+
+  /**
+   * Returns an accessor for an InternalRow with given data type and 
ordinal.
+   */
+  def getAccessor(dataType: DataType, ordinal: Int): (InternalRow) => Any 
= dataType match {
+case BooleanType => (input) => input.getBoolean(ordinal)
+case ByteType => (input) => input.getByte(ordinal)
+case ShortType => (input) => input.getShort(ordinal)
+case IntegerType | DateType => (input) => input.getInt(ordinal)
+case LongType | TimestampType => (input) => input.getLong(ordinal)
+case FloatType => (input) => input.getFloat(ordinal)
+case DoubleType => (input) => input.getDouble(ordinal)
+case StringType => (input) => input.getUTF8String(ordinal)
+case BinaryType => (input) => input.getBinary(ordinal)
+case CalendarIntervalType => (input) => input.getInterval(ordinal)
+case t: DecimalType => (input) => input.getDecimal(ordinal, 
t.precision, t.scale)
+case t: StructType => (input) => input.getStruct(ordinal, t.size)
+case _: ArrayType => (input) => input.getArray(ordinal)
+case _: MapType => (input) => input.getMap(ordinal)
+case _ => (input) => input.get(ordinal, dataType)
--- End diff --

Handle `UDT`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20981
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21005
  
**[Test build #89047 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89047/testReport)**
 for PR 21005 at commit 
[`433`](https://github.com/apache/spark/commit/43314b1d443fac5ca27ecef80677dbe70ab7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...

2018-04-09 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20981#discussion_r180008527
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala
 ---
@@ -33,28 +33,14 @@ case class BoundReference(ordinal: Int, dataType: 
DataType, nullable: Boolean)
 
   override def toString: String = s"input[$ordinal, 
${dataType.simpleString}, $nullable]"
 
+  private lazy val accessor: InternalRow => Any = 
InternalRow.getAccessor(dataType, ordinal)
--- End diff --

Do we need to be lazy?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20981
  
**[Test build #89048 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89048/testReport)**
 for PR 20981 at commit 
[`2eb2bf1`](https://github.com/apache/spark/commit/2eb2bf1853a0ba4de8f4a3adfe8407d04a075b22).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20981
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2087/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-09 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20981
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20944
  
**[Test build #89046 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89046/testReport)**
 for PR 20944 at commit 
[`1c801f1`](https://github.com/apache/spark/commit/1c801f1e673b3d6f9e94eeade08d5b309a105061).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20944
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20944
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2086/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...

2018-04-09 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20944
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20981
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89042/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20981
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89040/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20981
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff test Pyth...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20904
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89039/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20981
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff test Pyth...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20904
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20944
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20981
  
**[Test build #89040 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89040/testReport)**
 for PR 20981 at commit 
[`a8cdbe8`](https://github.com/apache/spark/commit/a8cdbe8baf2d508fb2583862042f1213cf0eae7b).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff test Pyth...

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20904
  
**[Test build #89039 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89039/testReport)**
 for PR 20904 at commit 
[`49a7ddb`](https://github.com/apache/spark/commit/49a7ddb45cb9a0035e3faed5906ecd37890333e1).
 * This patch **fails due to an unknown error code, -9**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21004
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21004
  
**[Test build #89044 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89044/testReport)**
 for PR 21004 at commit 
[`10536a6`](https://github.com/apache/spark/commit/10536a6dbf2ab37d7066915223a64e914cf53b5f).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20937
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20937
  
**[Test build #89045 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89045/testReport)**
 for PR 20937 at commit 
[`b817184`](https://github.com/apache/spark/commit/b817184d35d0e2589682f1dcd88b9f29b2063f5b).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20937
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89045/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20981
  
**[Test build #89042 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89042/testReport)**
 for PR 20981 at commit 
[`2eb2bf1`](https://github.com/apache/spark/commit/2eb2bf1853a0ba4de8f4a3adfe8407d04a075b22).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21004
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89044/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20944
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89043/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20944
  
**[Test build #89043 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89043/testReport)**
 for PR 20944 at commit 
[`1c801f1`](https://github.com/apache/spark/commit/1c801f1e673b3d6f9e94eeade08d5b309a105061).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21004
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2085/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21004
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20937
  
**[Test build #89045 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89045/testReport)**
 for PR 20937 at commit 
[`b817184`](https://github.com/apache/spark/commit/b817184d35d0e2589682f1dcd88b9f29b2063f5b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21004
  
**[Test build #89044 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89044/testReport)**
 for PR 21004 at commit 
[`10536a6`](https://github.com/apache/spark/commit/10536a6dbf2ab37d7066915223a64e914cf53b5f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-09 Thread MaxGekk
Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/20937
  
jenkins, retest this, please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Supp...

2018-04-09 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20937#discussion_r18138
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -361,6 +361,15 @@ class JacksonParser(
 // For such records, all fields other than the field configured by
 // `columnNameOfCorruptRecord` are set to `null`.
 throw BadRecordException(() => recordLiteral(record), () => None, 
e)
+  case e: CharConversionException if options.encoding.isEmpty =>
+val msg =
+  """Failed to parse a character. Encoding was detected 
automatically.
--- End diff --

ok, speaking about this concrete exception handling. The exception with the 
message is thrown ONLY when options.encoding.isEmpty is `true`. It means 
`encoding` is not set and actual encoding of a file was autodetected. The `msg` 
says about that actually:  `Encoding was detected automatically`.

Maybe `encoding` was detected correctly but the file contains a wrong char. 
In that case, the first sentence says this `Failed to parse a character`. The 
same could happen if you set `encoding` explicitly because you cannot guarantee 
that inputs are alway correct.

> I think automatic detection is true only when multuline is enabled.

Wrong char in input file can be in a file with UTF-8 read with `multiline = 
false` and in a file in UTF-16LE with `multiline = true`.

My point is the mention of the `multiline` option in the error message 
doesn't help to user to solve the issue. A possible solution is to set 
`encoding` explicitly - what the message says actually.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20944
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20944
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2084/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20944
  
**[Test build #89043 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89043/testReport)**
 for PR 20944 at commit 
[`1c801f1`](https://github.com/apache/spark/commit/1c801f1e673b3d6f9e94eeade08d5b309a105061).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5