date:20180325

[GitHub] spark issue #20895: [SPARK-23787][tests] Fix file download test in SparkSubm...

2018-03-25 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20895
  
Merging to master branch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20851: [SPARK-23727][SQL] Support for pushing down filters for ...

2018-03-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20851
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88575/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20851: [SPARK-23727][SQL] Support for pushing down filters for ...

2018-03-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20851
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20851: [SPARK-23727][SQL] Support for pushing down filters for ...

2018-03-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20851
  
**[Test build #88575 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88575/testReport)**
 for PR 20851 at commit 
[`b70def0`](https://github.com/apache/spark/commit/b70def0811ed91a7d24cb7253e296b47beed9f44).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should truncat...

2018-03-25 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20887
  
Since this is a behaviour change, I think we need to update the migration 
guide?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should ...

2018-03-25 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20887#discussion_r176982595
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2792,4 +2793,31 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  test("`Cast` to CHAR/VARCHAR should truncate the values") {
--- End diff --

Also, we need tests in `ExpressionParserSuite`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...

2018-03-25 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20858
  
Also, `postgresql` has the function `array_cat` for concatenating arrays, 
so it might be better to make the behaviour the same with the `postgresql` one:
https://www.postgresql.org/docs/10/static/functions-array.html


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...

2018-03-25 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20858#discussion_r176981046
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +289,152 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Concatenates multiple arrays into one.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr, ...) - Concatenates multiple arrays into one.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(4, 5), array(6));
+   [1,2,3,4,5,6]
+  """)
+case class ConcatArrays(children: Seq[Expression]) extends Expression with 
NullSafeEvaluation {
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val arrayCheck = checkInputDataTypesAreArrays
+if(arrayCheck.isFailure) arrayCheck
+else TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), 
s"function $prettyName")
+  }
+
+  private def checkInputDataTypesAreArrays(): TypeCheckResult =
+  {
+val mismatches = children.zipWithIndex.collect {
+  case (child, idx) if !ArrayType.acceptsType(child.dataType) =>
+s"argument ${idx + 1} has to be ${ArrayType.simpleString} type, " +
+  s"however, '${child.sql}' is of ${child.dataType.simpleString} 
type."
+}
+
+if (mismatches.isEmpty) {
+  TypeCheckResult.TypeCheckSuccess
+} else {
+  TypeCheckResult.TypeCheckFailure(mismatches.mkString(" "))
+}
+  }
+
+  override def dataType: ArrayType =
+children
+  .headOption.map(_.dataType.asInstanceOf[ArrayType])
+  .getOrElse(ArrayType.defaultConcreteType.asInstanceOf[ArrayType])
+
+
+  override protected def nullSafeEval(inputs: Seq[Any]): Any = {
+val elements = 
inputs.flatMap(_.asInstanceOf[ArrayData].toObjectArray(dataType.elementType))
+new GenericArrayData(elements)
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+nullSafeCodeGen(ctx, ev, arrays => {
+  val elementType = dataType.elementType
+  if (CodeGenerator.isPrimitiveType(elementType)) {
+genCodeForConcatOfPrimitiveElements(ctx, elementType, arrays, 
ev.value)
+  } else {
+genCodeForConcatOfComplexElements(ctx, arrays, ev.value)
+  }
+})
+  }
+
+  private def genCodeForNumberOfElements(
+ctx: CodegenContext,
+elements: Seq[String]
+  ) : (String, String) = {
+val variableName = ctx.freshName("numElements")
+val code = elements
+  .map(el => s"$variableName += $el.numElements();")
+  .foldLeft( s"int $variableName = 0;")((acc, s) => acc + "\n" + s)
+(code, variableName)
+  }
+
+  private def genCodeForConcatOfPrimitiveElements(
+ctx: CodegenContext,
+elementType: DataType,
+elements: Seq[String],
+arrayDataName: String
+  ): String = {
+val arrayName = ctx.freshName("array")
+val arraySizeName = ctx.freshName("size")
+val counter = ctx.freshName("counter")
+val tempArrayDataName = ctx.freshName("tempArrayData")
+
+val (numElemCode, numElemName) = genCodeForNumberOfElements(ctx, 
elements)
+
+val unsafeArraySizeInBytes = s"""
+  |int $arraySizeName = 
UnsafeArrayData.calculateHeaderPortionInBytes($numElemName) +
+  
|${classOf[ByteArrayMethods].getName}.roundNumberOfBytesToNearestWord(
+  |${elementType.defaultSize} * $numElemName
+  |);
+  """.stripMargin
+val baseOffset = Platform.BYTE_ARRAY_OFFSET
+
+val primitiveValueTypeName = 
CodeGenerator.primitiveTypeName(elementType)
+val assignments = elements.map { el =>
+  s"""
+|for(int z = 0; z < $el.numElements(); z++) {
+| if($el.isNullAt(z)) {
+|   $tempArrayDataName.setNullAt($counter);
+| } else {
+|   $tempArrayDataName.set$primitiveValueTypeName(
+| $counter,
+| $el.get$primitiveValueTypeName(z)
+|   );
+| }
+| $counter++;
+|}
+""".stripMargin
+}.mkString("\n")
+
+s"""
+  |$numElemCode
+  |$unsafeArraySizeInBytes
+  |byte[] $arrayName = new byte[$arraySizeName];
+  |UnsafeArrayData $tempArrayDataName = new UnsafeArrayData();
+  |Platform.putLong($arrayName, $baseOffset, $numElemName);
+  |$tempArrayDataName.pointTo($arrayName, $baseOffset, $arraySizeName);
+  |int $counter = 0;
+  |$assignments
+  |$arrayDataName = $tempArrayDataName;
+""".stripMargin
+

[GitHub] spark issue #20893: [SPARK-23785][LAUNCHER] LauncherBackend doesn't check st...

2018-03-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20893
  
**[Test build #88577 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88577/testReport)**
 for PR 20893 at commit 
[`4ca8a32`](https://github.com/apache/spark/commit/4ca8a32e2a518f3c7ccecd406a8b03eac06f860b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20893: [SPARK-23785][LAUNCHER] LauncherBackend doesn't check st...

2018-03-25 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20893
  
Jenkins, ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20893: [SPARK-23785][LAUNCHER] LauncherBackend doesn't check st...

2018-03-25 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/20893
  
The change looks good, cc @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20893: [SPARK-23785][LAUNCHER] LauncherBackend doesn't c...

2018-03-25 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20893#discussion_r176978788
  
--- Diff: 
core/src/main/scala/org/apache/spark/launcher/LauncherBackend.scala ---
@@ -114,10 +114,10 @@ private[spark] abstract class LauncherBackend {
 
 override def close(): Unit = {
   try {
+_isConnected = false
 super.close()
   } finally {
 onDisconnected()
--- End diff --

I searched the code and seems this is a no-op?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...

2018-03-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20795
  
**[Test build #88576 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88576/testReport)**
 for PR 20795 at commit 
[`029ee6c`](https://github.com/apache/spark/commit/029ee6cf14e3cde33d577672f0ca46941991948c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20895: [SPARK-23787][tests] Fix file download test in SparkSubm...

2018-03-25 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/20895
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `p...

2018-03-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20900


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...

2018-03-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20900
  
Merged to master and branch-2.3 anyway.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20892: [SPARK-23700][PYTHON] Cleanup imports in pyspark....

2018-03-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20892


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20892: [SPARK-23700][PYTHON] Cleanup imports in pyspark.sql

2018-03-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20892
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...

2018-03-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20900
  
I think we should generally make everything works in both Python 2 and 
Python 3 but I want to know if there are special chases that I am missing too 
if there are any.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20756: [SPARK-23593][SQL] Add interpreted execution for ...

2018-03-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20756#discussion_r176973244
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -1261,8 +1261,42 @@ case class InitializeJavaBean(beanInstance: 
Expression, setters: Map[String, Exp
   override def children: Seq[Expression] = beanInstance +: 
setters.values.toSeq
   override def dataType: DataType = beanInstance.dataType
 
-  override def eval(input: InternalRow): Any =
-throw new UnsupportedOperationException("Only code-generated 
evaluation is supported.")
+  private lazy val resolvedSetters = {
+assert(beanInstance.dataType.isInstanceOf[ObjectType])
+
+val ObjectType(beanClass) = beanInstance.dataType
+setters.map {
+  case (name, expr) =>
+// Looking for known type mapping first, then using Class attached 
in `ObjectType`.
+// Finally also looking for general `Object`-type parameter for 
generic methods.
+val paramTypes = 
CallMethodViaReflection.typeMapping.getOrElse(expr.dataType,
+Seq(expr.dataType.asInstanceOf[ObjectType].cls)) ++ 
Seq(classOf[Object])
+val methods = paramTypes.flatMap { fieldClass =>
+  try {
+Some(beanClass.getDeclaredMethod(name, fieldClass))
+  } catch {
+case e: NoSuchMethodException => None
+  }
+}
+if (methods.isEmpty) {
+  throw new NoSuchMethodException(s"""A method named "$name" is 
not declared """ +
+"in any enclosing class nor any supertype")
+}
+methods.head -> expr
+}
+  }
+
+  override def eval(input: InternalRow): Any = {
+val instance = beanInstance.eval(input)
+if (instance != null) {
+  val bean = instance.asInstanceOf[Object]
+  resolvedSetters.foreach {
+case (setter, expr) =>
+  setter.invoke(bean, expr.eval(input).asInstanceOf[AnyRef])
--- End diff --

Correct me if I'm wrong:
If the setter takes primitive types, like `setAge(int i)`, and we pass a 
null via reflection, the actual passed value would be 0. This is different from 
the codegen version, seems like a potential bug. 

IMO, I think the codegen version is wrong. In general we should not read 
the codegen value if it's marked as null.

This doesn't cause any problem, because we only use these object 
expressions to generate encoders, which means the parameter for a primitive 
setter won't be null. But if we treat these expressions as a general DSL, we 
should be careful about this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...

2018-03-25 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20858#discussion_r176971799
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +289,152 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Concatenates multiple arrays into one.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(expr, ...) - Concatenates multiple arrays into one.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(4, 5), array(6));
+   [1,2,3,4,5,6]
+  """)
+case class ConcatArrays(children: Seq[Expression]) extends Expression with 
NullSafeEvaluation {
--- End diff --

Can we add a common base class (e.g., `ConcatLike`) for handling nested 
`ConcatArrays` in the optimizer(`CombineConcat`)?


https://github.com/apache/spark/blob/e4bec7cb88b9ee63f8497e3f9e0ab0bfa5d5a77c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala#L649


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176971232
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -129,6 +154,10 @@ private[parquet] object ParquetFilters {
 case BinaryType =>
   (n: String, v: Any) =>
 FilterApi.gt(binaryColumn(n), 
Binary.fromReusedByteArray(v.asInstanceOf[Array[Byte]]))
+case DateType if SQLConf.get.parquetFilterPushDownDate =>
--- End diff --

we should refactor it, so that adding a new data type doesn't need to touch 
so many places. This can be done later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176971146
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -50,6 +59,10 @@ private[parquet] object ParquetFilters {
   (n: String, v: Any) => FilterApi.eq(
 binaryColumn(n),
 Option(v).map(b => 
Binary.fromReusedByteArray(v.asInstanceOf[Array[Byte]])).orNull)
+case DateType if SQLConf.get.parquetFilterPushDownDate =>
+  (n: String, v: Any) => FilterApi.eq(
+intColumn(n),
+Option(v).map(date => 
dateToDays(date.asInstanceOf[Date]).asInstanceOf[Integer]).orNull)
--- End diff --

sorry I was wrong. I took a look at how these dates get created, in 
`DataSourceStrategy.translateFilter`. Actually they are created via 
`DateTimeUtils.toJavaDate` without timezone, which means here we should not use 
timezone either.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...

2018-03-25 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20858
  
We should handle different (and compatible) typed arrays in this funs?
```
scala> sql("select concat_arrays(array(1L, 2L), array(3, 4))").show
org.apache.spark.sql.AnalysisException: cannot resolve 
'concat_arrays(array(1L, 2L), array(3, 4))' due to data type mismatch: input to 
function concat_arrays sh
ould all be the same type, but it's [array, array]; line 1 pos 
7;
'Project [unresolvedalias(concat_arrays(array(1, 2), array(3, 4)), None)]
+- OneRowRelation
```
Also, could you add more tests for this case in `SQLQueryTestSuite`? 
probably, we can add a new test file like `concat_arrays.sql` in 
`typeCoercion.native`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...

2018-03-25 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20858
  
The current code can't handle inner arrays;
```
scala> sql("select concat_arrays(array(1, 2, array(3, 4)), array(5, 6, 7, 
8))").show
org.apache.spark.sql.AnalysisException: cannot resolve 'array(1, 2, 
array(3, 4))' due to data type mismatch: input to function array should all be 
the same type, but it's [int, int, array]; line 1 pos 21;
'Project [unresolvedalias('concat_arrays(array(1, 2, array(3, 4)), array(5, 
6, 7, 8)), None)]
+- OneRowRelation
```

IMHO, it's better to make this function behaviour the same with postgresql: 
https://www.postgresql.org/docs/10/static/functions-array.html
Could you brush up code to handle this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20842: [SPARK-23162][PySpark][ML] Add r2adj into Python API in ...

2018-03-25 Thread kevinyu98

Github user kevinyu98 commented on the issue:

https://github.com/apache/spark/pull/20842
  
@tengpeng Thanks, are you using ./dev/lint-python  to run the python style 
test locally? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20849: [SPARK-23723] New charset option for json datasou...

2018-03-25 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20849#discussion_r176966887
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -176,7 +176,7 @@ def json(self, path, schema=None, 
primitivesAsString=None, prefersDecimal=None,
  allowComments=None, allowUnquotedFieldNames=None, 
allowSingleQuotes=None,
  allowNumericLeadingZero=None, 
allowBackslashEscapingAnyCharacter=None,
  mode=None, columnNameOfCorruptRecord=None, dateFormat=None, 
timestampFormat=None,
- multiLine=None, allowUnquotedControlChars=None):
+ multiLine=None, allowUnquotedControlChars=None, charset=None):
--- End diff --

Shall we ues `encoding` to be consistent with CSV? `charset` had an alias 
`encoding` to look after Pandas and R.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20851: [SPARK-23727][SQL] Support for pushing down filters for ...

2018-03-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20851
  
**[Test build #88575 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88575/testReport)**
 for PR 20851 at commit 
[`b70def0`](https://github.com/apache/spark/commit/b70def0811ed91a7d24cb7253e296b47beed9f44).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-25 Thread yucai

Github user yucai commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176966288
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -353,6 +353,13 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val PARQUET_FILTER_PUSHDOWN_DATE_ENABLED = 
buildConf("spark.sql.parquet.filterPushdown.date")
+.doc("If true, enables Parquet filter push-down optimization for Date. 
" +
+  "This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' is enabled.")
+.internal()
+.booleanConf
+.createWithDefault(false)
--- End diff --

@dongjoon-hyun, we are investigating to use this feature in some kind of 
eBay's queries, if its performance is good, will benefit a lot.
As per our discussion, I will turn it on by default, thanks very much!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...

2018-03-25 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20858#discussion_r176966017
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3046,6 +3046,14 @@ object functions {
 ArrayContains(column.expr, Literal(value))
   }
 
+  /**
+   * Merges multiple arrays into one by putting elements from the specific 
array after elements
+   * from the previous array. If any of the arrays is null, null is 
returned.
+   * @group collection_funcs
+   * @since 2.4.0
+   */
+  def concat_arrays(columns: Column*): Column = withExpr { 
ConcatArrays(columns.map(_.expr)) }
--- End diff --

We need to add this func. in `sql/functions` here? It seems we might 
recommend users to use these kinds of functions via `selectExpr`, so is it okay 
to add this only in `FunctionRegistry`? Thoughts? @viirya @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20877: [SPARK-23765][SQL] Supports custom line separator for js...

2018-03-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20877
  
If this one is merged, I believe it should be easier to review #20885 too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20849: [SPARK-23723] New charset option for json datasource

2018-03-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20849
  
I am against to this mainly by 
https://github.com/MaxGekk/spark-1/pull/1#discussion_r175444502 if there isn't 
better way than rewriting it.
Also, I think we should support `charset` option for text datasource too 
since the current option is incomplete.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20877: [SPARK-23765][SQL] Supports custom line separator for js...

2018-03-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20877
  
There was a discussion about the naming here - 
https://github.com/apache/spark/pull/20727#discussion_r172341859. I am against 
to `recordDelimiter`.

Both PR deal with a different problems. This PR deals with line separator 
only and that PR deals with line separator + flexible option.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-25 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19222
  
ping @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20753: [SPARK-23582][SQL] StaticInvoke should support interpret...

2018-03-25 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20753
  
ping @hvanhovell 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20797: [SPARK-23583][SQL] Invoke should support interpreted exe...

2018-03-25 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20797
  
ping @hvanhovell 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and BufferHolder...

2018-03-25 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20850
  
ping @hvanhovell 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...

2018-03-25 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20636
  
ping @gatorsmile and @hvanhovell 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20774: [SPARK-23549][SQL] Cast to timestamp when compari...

2018-03-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20774


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...

2018-03-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20774
  
LGTM

Thanks! Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20849: [SPARK-23723] New charset option for json datasource

2018-03-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20849
  
@MaxGekk @HyukjinKwon What are the status of this PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20885: [SPARK-23724][SPARK-23765][SQL] Line separator fo...

2018-03-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20885#discussion_r176958504
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
 ---
@@ -85,6 +85,38 @@ private[sql] class JSONOptions(
 
   val multiLine = 
parameters.get("multiLine").map(_.toBoolean).getOrElse(false)
 
+  val charset: Option[String] = Some("UTF-8")
--- End diff --

It sounds like we need to review https://github.com/apache/spark/pull/20849 
first


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20885: [SPARK-23724][SPARK-23765][SQL] Line separator fo...

2018-03-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20885#discussion_r176958338
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -176,7 +176,7 @@ def json(self, path, schema=None, 
primitivesAsString=None, prefersDecimal=None,
  allowComments=None, allowUnquotedFieldNames=None, 
allowSingleQuotes=None,
  allowNumericLeadingZero=None, 
allowBackslashEscapingAnyCharacter=None,
  mode=None, columnNameOfCorruptRecord=None, dateFormat=None, 
timestampFormat=None,
- multiLine=None, allowUnquotedControlChars=None):
+ multiLine=None, allowUnquotedControlChars=None, lineSep=None):
--- End diff --

rename it to `recordDelimiter `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20877: [SPARK-23765][SQL] Supports custom line separator for js...

2018-03-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20877
  
Since both PRs are ready for review, let us review both and see which one 
is better


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20877: [SPARK-23765][SQL] Supports custom line separator for js...

2018-03-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20877
  
Yeah. `recordDelimiter` is better based on the semantics.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20756: [SPARK-23593][SQL] Add interpreted execution for ...

2018-03-25 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20756#discussion_r176956857
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ObjectExpressionsSuite.scala
 ---
@@ -68,6 +68,23 @@ class ObjectExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   mapEncoder.serializer.head, mapExpected, mapInputRow)
   }
 
+  test("SPARK-23593: InitializeJavaBean should support interpreted 
execution") {
+val list = new java.util.LinkedList[Int]()
+list.add(1)
+
+val initializeBean = InitializeJavaBean(Literal.fromObject(new 
java.util.LinkedList[Int]),
+  Map("add" -> Literal(1)))
+checkEvaluation(initializeBean, list, InternalRow.fromSeq(Seq()))
+
+val initializeWithNonexistingMethod = InitializeJavaBean(
+  Literal.fromObject(new java.util.LinkedList[Int]),
--- End diff --

Can you also add a test for when the parameter types do not match up?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20756: [SPARK-23593][SQL] Add interpreted execution for ...

2018-03-25 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20756#discussion_r176956802
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -1261,8 +1261,42 @@ case class InitializeJavaBean(beanInstance: 
Expression, setters: Map[String, Exp
   override def children: Seq[Expression] = beanInstance +: 
setters.values.toSeq
   override def dataType: DataType = beanInstance.dataType
 
-  override def eval(input: InternalRow): Any =
-throw new UnsupportedOperationException("Only code-generated 
evaluation is supported.")
+  private lazy val resolvedSetters = {
+assert(beanInstance.dataType.isInstanceOf[ObjectType])
+
+val ObjectType(beanClass) = beanInstance.dataType
+setters.map {
+  case (name, expr) =>
+// Looking for known type mapping first, then using Class attached 
in `ObjectType`.
+// Finally also looking for general `Object`-type parameter for 
generic methods.
+val paramTypes = 
CallMethodViaReflection.typeMapping.getOrElse(expr.dataType,
+Seq(expr.dataType.asInstanceOf[ObjectType].cls)) ++ 
Seq(classOf[Object])
+val methods = paramTypes.flatMap { fieldClass =>
+  try {
+Some(beanClass.getDeclaredMethod(name, fieldClass))
+  } catch {
+case e: NoSuchMethodException => None
+  }
+}
+if (methods.isEmpty) {
+  throw new NoSuchMethodException(s"""A method named "$name" is 
not declared """ +
+"in any enclosing class nor any supertype")
+}
+methods.head -> expr
+}
+  }
+
+  override def eval(input: InternalRow): Any = {
+val instance = beanInstance.eval(input)
+if (instance != null) {
+  val bean = instance.asInstanceOf[Object]
+  resolvedSetters.foreach {
+case (setter, expr) =>
+  setter.invoke(bean, expr.eval(input).asInstanceOf[AnyRef])
--- End diff --

There is a subtle difference between code generation and interpreted mode 
here. A null value for an expression that maps to a java primitive will be some 
default value (e.g. -1) for code generation and `null` for interpreted mode, 
this can lead to different results.

I am not sure we should address this, because I am not 100% if this can 
ever happen. @cloud-fan could you shed some light on this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20756: [SPARK-23593][SQL] Add interpreted execution for ...

2018-03-25 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20756#discussion_r176956651
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -1261,8 +1261,39 @@ case class InitializeJavaBean(beanInstance: 
Expression, setters: Map[String, Exp
   override def children: Seq[Expression] = beanInstance +: 
setters.values.toSeq
   override def dataType: DataType = beanInstance.dataType
 
-  override def eval(input: InternalRow): Any =
-throw new UnsupportedOperationException("Only code-generated 
evaluation is supported.")
+  private lazy val resolvedSetters = {
+val ObjectType(beanClass) = beanInstance.dataType
+setters.map {
+  case (name, expr) =>
+// Looking for known type mapping first, then using Class attached 
in `ObjectType`.
+// Finally also looking for general `Object`-type parameter for 
generic methods.
+val paramTypes = 
CallMethodViaReflection.typeMapping.getOrElse(expr.dataType,
--- End diff --

Sorry for not coming back to this sooner. AFAIK `CallMethodViaReflection` 
expression was only designed to work with a couple of primitives. I think we 
are looking for something a little bit more complete here, i.e. support all 
types in Spark SQL's type system. I also don't think that we should put the 
mappings in `CallMethodViaReflection` because the mapping is now using in more 
expressions, `ScalaReflection` is IMO a better place for this logic.

And finally which PR will implement this. cc @maropu for visibility.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20835: [HOT-FIX] Fix SparkOutOfMemoryError: Unable to acquire 2...

2018-03-25 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/20835
  
I have cherry-picked this into branch-2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...

2018-03-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20774
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88574/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...

2018-03-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20774
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...

2018-03-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20774
  
**[Test build #88574 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88574/testReport)**
 for PR 20774 at commit 
[`5ca2341`](https://github.com/apache/spark/commit/5ca2341c39fb00a23dd67154e409c4a1408b7cd3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class PromoteStrings(conf: SQLConf) extends TypeCoercionRule `
  * `  case class InConversion(conf: SQLConf) extends TypeCoercionRule `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20861: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uuid expre...

2018-03-25 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/20861
  
@viirya I have backported #20817 to 2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20877: [SPARK-23765][SQL] Supports custom line separator for js...

2018-03-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20877
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20877: [SPARK-23765][SQL] Supports custom line separator for js...

2018-03-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20877
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88572/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20877: [SPARK-23765][SQL] Supports custom line separator for js...

2018-03-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20877
  
**[Test build #88572 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88572/testReport)**
 for PR 20877 at commit 
[`f5e7d34`](https://github.com/apache/spark/commit/f5e7d34d0e422789fdd979a6a17ee7f48b77d0be).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20811: [SPARK-23668][K8S] Add config option for passing through...

2018-03-25 Thread liyinan926

Github user liyinan926 commented on the issue:

https://github.com/apache/spark/pull/20811
  
@foxish @mccheah can you help merge this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-03-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20894
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88571/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-03-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20894
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-03-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20894
  
**[Test build #88571 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88571/testReport)**
 for PR 20894 at commit 
[`d6d370d`](https://github.com/apache/spark/commit/d6d370d98f3f12c9c53fd784c0b6e8f9d3d28f54).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...

2018-03-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20900
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...

2018-03-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20900
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88573/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...

2018-03-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20900
  
**[Test build #88573 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88573/testReport)**
 for PR 20900 at commit 
[`a3da39c`](https://github.com/apache/spark/commit/a3da39ca62f69fd4e3a4c417ed28613edd15924f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...

2018-03-25 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20900
  
> One general question: how do we tend to think about the py2/3 split for 
api quirks/features? Must everything that is added for py3 also be functional 
in py2?

ideally, is there something you have in mind that would not work in py2?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-03-25 Thread MaxGekk

Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20894#discussion_r176949956
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
@@ -150,6 +150,12 @@ class CSVOptions(
 
   val isCommentSet = this.comment != '\u'
 
+  /**
+   * The option enables checks of headers in csv files. In particular, 
column names
+   * are matched to field names of provided schema.
+   */
+  val checkHeader = getBool("checkHeader", true)
--- End diff --

Yes, it will break apps that declare explicitly schemas with field names 
different from header's column names like for example in a test of the PR that 
I modified. I will add info about the option to exception's message and update 
the migration guide.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...

2018-03-25 Thread mstewart141

Github user mstewart141 commented on the issue:

https://github.com/apache/spark/pull/20900
  
Many (though not all, I don't think `callable`s are impacted) of the 
limitations of pandas_udf relative to UDF in this domain are due to the fact 
that `pandas_udf` doesn't allow for keyword arguments at the call site. This 
obviously impacts plain old function-based `pandas_udf`s but also partial fns, 
where one would typically need to specify the argument (that one was partially 
applying) by name.

In the incremental commits of this PR as at:

https://github.com/apache/spark/pull/20900/commits/9ea2595f0cecb0cd05e0e6b99baf538679332e8b

You can see the kind of things I was investigating to try and fix that 
case. Indeed my original PR was (ambitiously) titled something about enabling 
kw args for all pandas_udfs. This is actually very easy to do for *functions* 
on python3 (and maybe plain functions in py2 also, but I suspect that this is 
also rather tricky as `getargspec` is pretty unhelpful when it comes to some of 
the kw-arg metadata one would need)). However, it is rather harder for the 
partial function case as one quickly gets into stacktraces from places like 
`python/pyspark/worker.py` where the semantics of the current strategy do not 
realize that a column from the arguments list may already be "accounted for" 
and one runs into duplicate arguments passed for `a`, e.g., as a result of 
this. 

My summary is that the change to allow kw for functions is simple (at least 
in py3 -- indeed my incremental commit referenced above does this), but for 
partial fns maybe not so much. I'm pretty confident I'm most of the way to 
accomplishing the former, but not that latter.

However, I have no substantial knowledge of the pyspark codebase so you 
will likely have better luck there, should you go down that route :)

**TL;DR**: I could work on a PR to allow keyword arguments for python3 
functions (not partials, and not py2), but that is likely too narrow a goal 
given the broader context.

One general question: how do we tend to think about the py2/3 split for api 
quirks/features? Must everything that is added for py3 also be functional in 
py2?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-03-25 Thread MaxGekk

Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20894#discussion_r176949718
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -289,27 +294,52 @@ private[csv] object UnivocityParser {
*/
   def parseIterator(
   lines: Iterator[String],
-  shouldDropHeader: Boolean,
   parser: UnivocityParser,
   schema: StructType): Iterator[InternalRow] = {
 val options = parser.options
 
-val linesWithoutHeader = if (shouldDropHeader) {
-  // Note that if there are only comments in the first block, the 
header would probably
-  // be not dropped.
-  CSVUtils.dropHeaderLine(lines, options)
-} else {
-  lines
-}
-
 val filteredLines: Iterator[String] =
-  CSVUtils.filterCommentAndEmpty(linesWithoutHeader, options)
+  CSVUtils.filterCommentAndEmpty(lines, options)
 
 val safeParser = new FailureSafeParser[String](
   input => Seq(parser.parse(input)),
   parser.options.parseMode,
   schema,
   parser.options.columnNameOfCorruptRecord)
+
 filteredLines.flatMap(safeParser.parse)
   }
+
+  def checkHeaderColumnNames(
+parser: UnivocityParser,
+schema: StructType,
+columnNames: Array[String],
+fileName: String
+  ): Unit = {
+if (parser.options.checkHeader && columnNames != null) {
+  val fieldNames = schema.map(_.name)
+  val isMatched = fieldNames.zip(columnNames).forall { pair =>
+val (nameInSchema, nameInHeader) = pair
+nameInSchema == nameInHeader
--- End diff --

We do not declare case sensitivity of CSV inputs in our docs. Also I have 
not found explicit statement in csv descriptions about case sensitivity.  It 
seems it is up to implementations how to handle such cases. For example, Apache 
Commons allow to configure the behavior: 
https://commons.apache.org/proper/commons-csv/apidocs/index.html . 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...

2018-03-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20774
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...

2018-03-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20774
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1745/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20787: [MINOR][DOCS] Documenting months_between directio...

2018-03-25 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20787#discussion_r176949408
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -1117,11 +1117,23 @@ case class AddMonths(startDate: Expression, 
numMonths: Expression)
 }
 
 /**
- * Returns number of months between dates date1 and date2.
+ * Returns number of months between dates `timestamp1` and `timestamp2`.
+ * If `timestamp` is later than `timestamp2`, then the result is positive.
+ * If `timestamp1` and `timestamp2` are on the same day of month, or both
+ * are the last day of month, time of day will be ignored.
+ * Otherwise, the difference is calculated based on 31 days per month, and
+ * rounded to 8 digits.
  */
 // scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(timestamp1, timestamp2) - Returns number of months 
between `timestamp1` and `timestamp2`.",
+  usage = """
+_FUNC_(timestamp1, timestamp2) - 
+If `timestamp` is later than `timestamp2`, then the result is positive.
+If `timestamp1` and `timestamp2` are on the same day of month, or both
+are the last day of month, time of day will be ignored.
+Otherwise, the difference is calculated based on 31 days per month, and
+rounded to 8 digits.
+""",
--- End diff --

```scala
  usage = """
_FUNC_(timestamp1, timestamp2) - If `timestamp` is later than 
`timestamp2`, then the result
  is positive. If `timestamp1` and `timestamp2` are on the same day of 
month, or both
  are the last day of month, time of day will be ignored. Otherwise, 
the difference is
  calculated based on 31 days per month, and rounded to 8 digits.
  """,
```

Let's keep the format consistent.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...

2018-03-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20774
  
**[Test build #88574 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88574/testReport)**
 for PR 20774 at commit 
[`5ca2341`](https://github.com/apache/spark/commit/5ca2341c39fb00a23dd67154e409c4a1408b7cd3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...

2018-03-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20900
  
**[Test build #88573 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88573/testReport)**
 for PR 20900 at commit 
[`a3da39c`](https://github.com/apache/spark/commit/a3da39ca62f69fd4e3a4c417ed28613edd15924f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-03-25 Thread MaxGekk

Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/20894
  
@gatorsmile For example, Spark is case sensitive for jsons:
```
cat ./case_sesitive.json
{"FIELD1": 1}
```
If schema is specified with field name "field1":
```
val schema=new StructType().add("field1", IntegerType)
spark.read.schema(schema).json("case_sesitive.json").show
```
it cannot match field:
```
+--+
|field1|
+--+
|  null|
+--+
```
but if schema's field and actual json field in the same case:
```
val schema=new StructType().add("FIELD1", IntegerType)
spark.read.schema(schema).json("case_sesitive.json").show
```
```
+--+
|FIELD1|
+--+
| 1|
+--+
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20267: [SPARK-23068][BUILD][RELEASE][WIP] doc build erro...

2018-03-25 Thread felixcheung

Github user felixcheung closed the pull request at:

https://github.com/apache/spark/pull/20267


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20798: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `p...

2018-03-25 Thread mstewart141

Github user mstewart141 closed the pull request at:

https://github.com/apache/spark/pull/20798


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20798: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...

2018-03-25 Thread mstewart141

Github user mstewart141 commented on the issue:

https://github.com/apache/spark/pull/20798
  
see https://github.com/apache/spark/pull/20900


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176948516
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -353,6 +353,13 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val PARQUET_FILTER_PUSHDOWN_DATE_ENABLED = 
buildConf("spark.sql.parquet.filterPushdown.date")
+.doc("If true, enables Parquet filter push-down optimization for Date. 
" +
+  "This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' is enabled.")
+.internal()
+.booleanConf
+.createWithDefault(false)
--- End diff --

Great! Thank you for confirmation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20877: [SPARK-23765][SQL] Supports custom line separator for js...

2018-03-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20877
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1744/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20877: [SPARK-23765][SQL] Supports custom line separator for js...

2018-03-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20877
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20877: [SPARK-23765][SQL] Supports custom line separator for js...

2018-03-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20877
  
**[Test build #88572 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88572/testReport)**
 for PR 20877 at commit 
[`f5e7d34`](https://github.com/apache/spark/commit/f5e7d34d0e422789fdd979a6a17ee7f48b77d0be).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20877: [SPARK-23765][SQL] Supports custom line separator for js...

2018-03-25 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/20877
  
We can also change both if they havenât been released yet.

On Sun, Mar 25, 2018 at 10:37 AM Maxim Gekk 
wrote:

> @gatorsmile  The PR has been already
> submitted: #20885  . Frankly
> speaking I would prefer another name for the option like we discussed
> before: MaxGekk#1  but similar
> PR for text datasource had been merged already: #20727
>  . And I think it is more
> important to have the same option across all datasource. That's why I
> renamed *recordDelimiter* to *lineSep* in #20885
>  / cc @rxin
> 
>
> â
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20877: [SPARK-23765][SQL] Supports custom line separator for js...

2018-03-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20877
  
Correct me if I am wrong. None of renaming or adding more flexible 
functionality to the line separator blocks this PR, right?

Even if we go renaming, we should do it for text datasource too which I 
believe is better to do it separately, and the flexible functionality in the 
line separator looks needing more feedback and discussion.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20877: [SPARK-23765][SQL] Supports custom line separator for js...

2018-03-25 Thread MaxGekk

Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/20877
  
@gatorsmile The PR has been already submitted: 
https://github.com/apache/spark/pull/20885 . Frankly speaking I would prefer 
another name for the option like we discussed before: 
https://github.com/MaxGekk/spark-1/pull/1 but similar PR for text datasource 
had been merged already: https://github.com/apache/spark/pull/20727 . And I 
think it is more important to have the same option across all datasource. 
That's why I renamed *recordDelimiter* to *lineSep* in 
https://github.com/apache/spark/pull/20885 / cc @rxin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20877: [SPARK-23765][SQL] Supports custom line separator for js...

2018-03-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20877
  
He submitted this - https://github.com/apache/spark/pull/20885 and I 
believe we need more feedback and another review iteration.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20787: [MINOR][DOCS] Documenting months_between directio...

2018-03-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20787#discussion_r176946746
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -1117,11 +1117,23 @@ case class AddMonths(startDate: Expression, 
numMonths: Expression)
 }
 
 /**
- * Returns number of months between dates date1 and date2.
+ * Returns number of months between dates `timestamp1` and `timestamp2`.
+ * If `timestamp` is later than `timestamp2`, then the result is positive.
+ * If `timestamp1` and `timestamp2` are on the same day of month, or both
+ * are the last day of month, time of day will be ignored.
+ * Otherwise, the difference is calculated based on 31 days per month, and
+ * rounded to 8 digits.
--- End diff --

Thanks for improving our doc! 

Could you also add the related tests in our test suite? 
https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala#L488-L511


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r176946273
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -353,6 +353,13 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val PARQUET_FILTER_PUSHDOWN_DATE_ENABLED = 
buildConf("spark.sql.parquet.filterPushdown.date")
+.doc("If true, enables Parquet filter push-down optimization for Date. 
" +
+  "This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' is enabled.")
+.internal()
+.booleanConf
+.createWithDefault(false)
--- End diff --

This is not a bug fix. We will not backport it to branch-2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-03-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20894
  
I just checked the JIRA https://issues.apache.org/jira/browse/SPARK-23786, 
I think CSV should follow the other format (e.g., parquet), right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-03-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20894#discussion_r176946135
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -289,27 +294,52 @@ private[csv] object UnivocityParser {
*/
   def parseIterator(
   lines: Iterator[String],
-  shouldDropHeader: Boolean,
   parser: UnivocityParser,
   schema: StructType): Iterator[InternalRow] = {
 val options = parser.options
 
-val linesWithoutHeader = if (shouldDropHeader) {
-  // Note that if there are only comments in the first block, the 
header would probably
-  // be not dropped.
-  CSVUtils.dropHeaderLine(lines, options)
-} else {
-  lines
-}
-
 val filteredLines: Iterator[String] =
-  CSVUtils.filterCommentAndEmpty(linesWithoutHeader, options)
+  CSVUtils.filterCommentAndEmpty(lines, options)
 
 val safeParser = new FailureSafeParser[String](
   input => Seq(parser.parse(input)),
   parser.options.parseMode,
   schema,
   parser.options.columnNameOfCorruptRecord)
+
 filteredLines.flatMap(safeParser.parse)
   }
+
+  def checkHeaderColumnNames(
+parser: UnivocityParser,
+schema: StructType,
+columnNames: Array[String],
+fileName: String
+  ): Unit = {
+if (parser.options.checkHeader && columnNames != null) {
+  val fieldNames = schema.map(_.name)
+  val isMatched = fieldNames.zip(columnNames).forall { pair =>
+val (nameInSchema, nameInHeader) = pair
+nameInSchema == nameInHeader
--- End diff --

Do we care the case sensitivity here? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-03-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20894#discussion_r176946115
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
@@ -150,6 +150,12 @@ class CSVOptions(
 
   val isCommentSet = this.comment != '\u'
 
+  /**
+   * The option enables checks of headers in csv files. In particular, 
column names
+   * are matched to field names of provided schema.
+   */
+  val checkHeader = getBool("checkHeader", true)
--- End diff --

Since the default is true, will it break the current app? This is a 
behavior change. We need to document it in the migration guide. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20877: [SPARK-23765][SQL] Supports custom line separator for js...

2018-03-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20877
  
@MaxGekk Will you submit a PR for addressing the comment 
https://github.com/apache/spark/pull/20877#issuecomment-375622342 in the next 
few weeks? If so, we can hold this PR. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20877: [SPARK-23765][SQL] Supports custom line separator...

2018-03-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20877#discussion_r176945815
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala 
---
@@ -268,6 +268,8 @@ final class DataStreamReader private[sql](sparkSession: 
SparkSession) extends Lo
* `java.text.SimpleDateFormat`. This applies to timestamp type.
* `multiLine` (default `false`): parse one record, which may span 
multiple lines,
* per file
+   * `lineSep` (default covers all `\r`, `\r\n` and `\n`): defines the 
line separator
--- End diff --

Add a test case for testing the default covers `\r`, `\r\n` and `\n`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19616: [SPARK-22404][YARN] Provide an option to use unmanaged A...

2018-03-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19616
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCD...

2018-03-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20343


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-03-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20343
  
Thanks! Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

2018-03-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20894
  
**[Test build #88571 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88571/testReport)**
 for PR 20894 at commit 
[`d6d370d`](https://github.com/apache/spark/commit/d6d370d98f3f12c9c53fd784c0b6e8f9d3d28f54).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20877: [SPARK-23765][SQL] Supports custom line separator for js...

2018-03-25 Thread MaxGekk

Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/20877
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20877: [SPARK-23765][SQL] Supports custom line separator for js...

2018-03-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20877
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88570/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20877: [SPARK-23765][SQL] Supports custom line separator for js...

2018-03-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20877
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20877: [SPARK-23765][SQL] Supports custom line separator for js...

2018-03-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20877
  
**[Test build #88570 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88570/testReport)**
 for PR 20877 at commit 
[`6cbf1ac`](https://github.com/apache/spark/commit/6cbf1ac2939160eb2b2496e3138a7c96d89877f9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20897: [MINOR][DOC] Fix a few markdown typos

2018-03-25 Thread Lemonjing

Github user Lemonjing commented on the issue:

https://github.com/apache/spark/pull/20897
  
@HyukjinKwon 

The description of ElementwiseProduct is obvious, and i think 
"`Qu8T948*1#`" is a mistake, if not, "`Qu8T948*1#`" is a input vector and 1 is 
scaling vector, and this is not a good example to express vector, even so, 
there is markdown syntax error. By the way, there is no example context here, 
such as "e.g."ï¼"for example" or something else. 

And i have queried the history and found another evidence. Before spark 
v1.6.0, this file do not have "`Qu8T948*1#`". So this maybe a mistake in one 
commit at v1.6.0 and this mistake not be found or fixed across v1.6.0 to now.

...

https://github.com/apache/spark/blob/v1.4.1/docs/mllib-feature-extraction.md no

https://github.com/apache/spark/blob/v1.5.0/docs/mllib-feature-extraction.md no

https://github.com/apache/spark/blob/v1.5.2/docs/mllib-feature-extraction.md no

https://github.com/apache/spark/blob/v1.6.0/docs/mllib-feature-extraction.md yes
1.6.0 ~ 2.3.0 yes

Thanks, and hope to confirm it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 146 matches

Mail list logo