date:20161223

[GitHub] spark pull request #16351: [SPARK-18943][SQL] Avoid per-record type dispatch...

2016-12-23 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16351#discussion_r93814350
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVTypeCastSuite.scala
 ---
@@ -66,144 +66,141 @@ class CSVTypeCastSuite extends SparkFunSuite {
   }
 
   test("Nullable types are handled") {
-assertNull(
-  CSVTypeCast.castTo("-", "_1", ByteType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", ShortType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", IntegerType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", LongType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", FloatType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", DoubleType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", BooleanType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", DecimalType.DoubleDecimal, true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", TimestampType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", DateType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo("-", "_1", StringType, nullable = true, 
CSVOptions("nullValue", "-")))
-assertNull(
-  CSVTypeCast.castTo(null, "_1", IntegerType, nullable = true, 
CSVOptions("nullValue", "-")))
-
-// casting a null to not nullable field should throw an exception.
-var message = intercept[RuntimeException] {
-  CSVTypeCast.castTo(null, "_1", IntegerType, nullable = false, 
CSVOptions("nullValue", "-"))
-}.getMessage
-assert(message.contains("null value found but field _1 is not 
nullable."))
-
-message = intercept[RuntimeException] {
-  CSVTypeCast.castTo("-", "_1", StringType, nullable = false, 
CSVOptions("nullValue", "-"))
-}.getMessage
-assert(message.contains("null value found but field _1 is not 
nullable."))
+val types = Seq(ByteType, ShortType, IntegerType, LongType, FloatType, 
DoubleType,
+  BooleanType, DecimalType.DoubleDecimal, TimestampType, DateType, 
StringType)
+
+types.foreach { t =>
+  val converterOne =
+CSVTypeCast.makeConverter("_1", t, nullable = true, 
CSVOptions("nullValue", "-"))
+  assertNull(converterOne.apply("-"))
+  assertNull(converterOne.apply(null))
+
+  // casting a null to not nullable field should throw an exception.
+  val converterTwo =
+CSVTypeCast.makeConverter("_1", t, nullable = false, 
CSVOptions("nullValue", "-"))
+
+  var message = intercept[RuntimeException] {
+converterTwo.apply("-")
+  }.getMessage
+  assert(message.contains("null value found but field _1 is not 
nullable."))
+
+  message = intercept[RuntimeException] {
+converterTwo.apply(null)
+  }.getMessage
+  assert(message.contains("null value found but field _1 is not 
nullable."))
+}
   }
 
   test("String type should also respect `nullValue`") {
--- End diff --

I see, some of tests are overlapped. Let me try to clean up with some 
comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16323: [SPARK-18911] [SQL] Define CatalogStatistics to i...

2016-12-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16323


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16323: [SPARK-18911] [SQL] Define CatalogStatistics to interact...

2016-12-23 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16323
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16389: [SPARK-18981][Core]The job hang problem when speculation...

2016-12-23 Thread mridulm

Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/16389
  
Does it fail in master without the fix ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16396: [SPARK-18994] clean up the local directories for applica...

2016-12-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16396
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16383: [SPARK-18980][SQL] implement Aggregator with Type...

2016-12-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16383#discussion_r93813855
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala
 ---
@@ -143,15 +197,96 @@ case class TypedAggregateExpression(
 }
   }
 
-  override def toString: String = {
-val input = inputDeserializer match {
-  case Some(UnresolvedDeserializer(deserializer, _)) => 
deserializer.dataType.simpleString
-  case Some(deserializer) => deserializer.dataType.simpleString
-  case _ => "unknown"
+  override def withInputInfo(
+  deser: Expression,
+  cls: Class[_],
+  schema: StructType): TypedAggregateExpression = {
+copy(inputDeserializer = Some(deser), inputClass = Some(cls), 
inputSchema = Some(schema))
+  }
+}
+
+case class ComplexTypedAggregateExpression(
+aggregator: Aggregator[Any, Any, Any],
+inputDeserializer: Option[Expression],
+inputClass: Option[Class[_]],
+inputSchema: Option[StructType],
+bufferSerializer: Seq[NamedExpression],
+bufferDeserializer: Expression,
+outputSerializer: Seq[Expression],
+dataType: DataType,
+nullable: Boolean,
+mutableAggBufferOffset: Int = 0,
+inputAggBufferOffset: Int = 0)
+  extends TypedImperativeAggregate[Any] with TypedAggregateExpression with 
NonSQLExpression {
+
+  override def deterministic: Boolean = true
+
+  override def children: Seq[Expression] = inputDeserializer.toSeq
+
+  override lazy val resolved: Boolean = inputDeserializer.isDefined && 
childrenResolved
+
+  override def references: AttributeSet = 
AttributeSet(inputDeserializer.toSeq)
+
+  override def createAggregationBuffer(): Any = aggregator.zero
+
+  private lazy val inputRowToObj = 
GenerateSafeProjection.generate(inputDeserializer.get :: Nil)
+
+  override def update(buffer: Any, input: InternalRow): Any = {
+val inputObj = inputRowToObj(input).get(0, ObjectType(classOf[Any]))
+if (inputObj != null) {
+  aggregator.reduce(buffer, inputObj)
+} else {
+  buffer
+}
+  }
+
+  override def merge(buffer: Any, input: Any): Any = {
+aggregator.merge(buffer, input)
+  }
+
+  private lazy val resultObjToRow = dataType match {
+case _: StructType =>
+  UnsafeProjection.create(CreateStruct(outputSerializer))
+case _ =>
+  assert(outputSerializer.length == 1)
+  UnsafeProjection.create(outputSerializer.head)
+  }
+
+  override def eval(buffer: Any): Any = {
+val resultObj = aggregator.finish(buffer)
+if (resultObj == null) {
+  null
+} else {
+  resultObjToRow(InternalRow(resultObj)).get(0, dataType)
 }
+  }
 
-s"$nodeName($input)"
+  private lazy val bufferObjToRow = 
UnsafeProjection.create(bufferSerializer)
+
+  override def serialize(buffer: Any): Array[Byte] = {
+bufferObjToRow(InternalRow(buffer)).getBytes
   }
 
-  override def nodeName: String = 
aggregator.getClass.getSimpleName.stripSuffix("$")
+  private lazy val bufferRow = new UnsafeRow(bufferSerializer.length)
+  private lazy val bufferRowToObject = 
GenerateSafeProjection.generate(bufferDeserializer :: Nil)
+
+  override def deserialize(storageFormat: Array[Byte]): Any = {
+bufferRow.pointTo(storageFormat, storageFormat.length)
+bufferRowToObject(bufferRow).get(0, ObjectType(classOf[Any]))
+  }
+
+  override def withNewMutableAggBufferOffset(
+  newMutableAggBufferOffset: Int): ComplexTypedAggregateExpression =
+copy(mutableAggBufferOffset = newMutableAggBufferOffset)
+
+  override def withNewInputAggBufferOffset(
+  newInputAggBufferOffset: Int): ComplexTypedAggregateExpression =
+copy(inputAggBufferOffset = newInputAggBufferOffset)
+
+  override def withInputInfo(
--- End diff --

how to implement `copy` in a trait? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16383: [SPARK-18980][SQL] implement Aggregator with Type...

2016-12-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16383#discussion_r93813849
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala
 ---
@@ -143,15 +197,96 @@ case class TypedAggregateExpression(
 }
   }
 
-  override def toString: String = {
-val input = inputDeserializer match {
-  case Some(UnresolvedDeserializer(deserializer, _)) => 
deserializer.dataType.simpleString
-  case Some(deserializer) => deserializer.dataType.simpleString
-  case _ => "unknown"
+  override def withInputInfo(
+  deser: Expression,
+  cls: Class[_],
+  schema: StructType): TypedAggregateExpression = {
+copy(inputDeserializer = Some(deser), inputClass = Some(cls), 
inputSchema = Some(schema))
+  }
+}
+
+case class ComplexTypedAggregateExpression(
+aggregator: Aggregator[Any, Any, Any],
+inputDeserializer: Option[Expression],
+inputClass: Option[Class[_]],
+inputSchema: Option[StructType],
+bufferSerializer: Seq[NamedExpression],
+bufferDeserializer: Expression,
+outputSerializer: Seq[Expression],
+dataType: DataType,
+nullable: Boolean,
+mutableAggBufferOffset: Int = 0,
+inputAggBufferOffset: Int = 0)
+  extends TypedImperativeAggregate[Any] with TypedAggregateExpression with 
NonSQLExpression {
+
+  override def deterministic: Boolean = true
--- End diff --

like UDF, we can also assume the `Aggregator` is always deterministic. I 
think in the future we should allow users to define nondeterministic 
UDF(including `Aggregator`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16396: [SPARK-18994] clean up the local directories for ...

2016-12-23 Thread liujianhuiouc

GitHub user liujianhuiouc opened a pull request:

https://github.com/apache/spark/pull/16396

[SPARK-18994] clean up the local directories for application in future by 
annother thread

## What changes were proposed in this pull request?

clean up the directories of the app by asynchronous method in future block



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liujianhuiouc/spark-1 spark-18994

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16396.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16396


commit 5d198bfcde1a789b42b2125582c0b551fc48896c
Author: liujianhui 
Date:   2016-12-24T06:56:04Z

[SPARK-18994] fix bug




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16308: [SPARK-18936][SQL] Infrastructure for session local time...

2016-12-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16308
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16308: [SPARK-18936][SQL] Infrastructure for session local time...

2016-12-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16308
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70565/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16308: [SPARK-18936][SQL] Infrastructure for session local time...

2016-12-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16308
  
**[Test build #70565 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70565/testReport)**
 for PR 16308 at commit 
[`7917d0f`](https://github.com/apache/spark/commit/7917d0f100e47e8e3cce8440a38c3b16ef74732d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-23 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93813725
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -56,33 +58,81 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
-val values = ctx.freshName("values")
-ctx.addMutableState("Object[]", values, s"this.$values = null;")
-
-ev.copy(code = s"""
-  this.$values = new Object[${children.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-children.zipWithIndex.map { case (e, i) =>
-  val eval = e.genCode(ctx)
-  eval.code + s"""
-if (${eval.isNull}) {
-  $values[$i] = null;
-} else {
-  $values[$i] = ${eval.value};
-}
-   """
-}) +
-  s"""
-final ArrayData ${ev.value} = new $arrayClass($values);
-this.$values = null;
-  """, isNull = "false")
+val array = ctx.freshName("array")
+
+val et = dataType.elementType
+val evals = children.map(e => e.genCode(ctx))
+val isPrimitiveArray = ctx.isPrimitiveType(et)
+val (preprocess, arrayData) =
+  GenArrayData.getCodeArrayData(ctx, et, children.size, 
isPrimitiveArray, array)
+
+val assigns = if (isPrimitiveArray) {
+  val primitiveTypeName = ctx.primitiveTypeName(et)
+  evals.zipWithIndex.map { case (eval, i) =>
+eval.code + s"""
+ if (${eval.isNull}) {
+   $arrayData.setNullAt($i);
+ } else {
+   $arrayData.set$primitiveTypeName($i, ${eval.value});
+ }
+   """
+  }
+} else {
+  evals.zipWithIndex.map { case (eval, i) =>
+eval.code + s"""
+ if (${eval.isNull}) {
+   $array[$i] = null;
+ } else {
+   $array[$i] = ${eval.value};
+ }
+   """
+  }
+}
+ev.copy(code =
--- End diff --

@cloud-fan I love this idea. When I have just implemented this, I hit the 
following problem. Janino throws an exception. IMHO, [this 
part](https://github.com/janino-compiler/janino/blob/janino_3.0.0/janino/src/org/codehaus/janino/UnitCompiler.java#L4331-L4348)
 should optimize without throwing an exception.  
We may have some options
1. remove `if (... instanceof ...)` for projection in Spark
2. submit a PR to janino now to throw this exception and wait for this 
change until the new janino with this PR will be available
3. submit a PR to janino now to throw this exception and postpone this 
change later
4. others

what do you think?
```java
...
/* 037 */ UTF8String value7 = (UTF8String) obj5;
/* 038 */ if (false) {
/* 039 */   array1[0] = null;
/* 040 */ } else {
/* 041 */   array1[0] = value7;
/* 042 */ }
...
/* 068 */ Object obj9 = ((Expression) references[9]).eval(null);
/* 069 */ UTF8String value11 = (UTF8String) obj9;
/* 070 */ if (false) {
/* 071 */   array1[4] = null;
/* 072 */ } else {
/* 073 */   array1[4] = value11;
/* 074 */ }
/* 075 */ org.apache.spark.sql.catalyst.util.GenericArrayData 
arrayData1 = new org.apache.spark.sql.catalyst.util.GenericArrayData(array1);
/* 076 */ // Remember the current cursor so that we can calculate how 
many bytes are
/* 077 */ // written later.
/* 078 */ final int tmpCursor2 = holder.cursor;
/* 079 */
/* 080 */ if (arrayData1 instanceof UnsafeArrayData) {
/* 081 */
/* 082 */   final int sizeInBytes1 = ((UnsafeArrayData) 
arrayData1).getSizeInBytes();
/* 083 */   // grow the global buffer before writing data.
/* 084 */   holder.grow(sizeInBytes1);
/* 085 */   ((UnsafeArrayData) arrayData1).writeToMemory(holder.buffer, 
holder.cursor);
/* 086 */   holder.cursor += sizeInBytes1;
/* 087 */
/* 088 */ } else {
...

org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
80, Column 26: "org.apache.spark.sql.catalyst.util.GenericArrayData" can never 
be an instance of "org.apache.spark.sql.catalyst.expressions.UnsafeArrayData"
at 
org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:11004)
at 
org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4345)
at 
org.codehaus.janino.UnitCompiler.access$7400(UnitCompiler.java:206)
at 
org.codehaus.janino.UnitCompiler$12.visitInstanceof(UnitCompiler.java:3773)
at

[GitHub] spark issue #16323: [SPARK-18911] [SQL] Define CatalogStatistics to interact...

2016-12-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16323
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16323: [SPARK-18911] [SQL] Define CatalogStatistics to interact...

2016-12-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16323
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70564/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16323: [SPARK-18911] [SQL] Define CatalogStatistics to interact...

2016-12-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16323
  
**[Test build #70564 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70564/testReport)**
 for PR 16323 at commit 
[`978bb11`](https://github.com/apache/spark/commit/978bb11d2bdbbf099473525b2baf714154640890).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16395: [SPARK-17075][SQL][WIP] implemented filter estimation

2016-12-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16395
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70562/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16395: [SPARK-17075][SQL][WIP] implemented filter estimation

2016-12-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16395
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16395: [SPARK-17075][SQL][WIP] implemented filter estimation

2016-12-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16395
  
**[Test build #70562 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70562/testReport)**
 for PR 16395 at commit 
[`56d1579`](https://github.com/apache/spark/commit/56d15790bd1bc5e5f7224212d3422814a2e1cfae).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class NumericRange(min: JDecimal, max: JDecimal) extends Range`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16392: [SPARK-18992] [SQL] Move spark.sql.hive.thriftServer.sin...

2016-12-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16392
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16392: [SPARK-18992] [SQL] Move spark.sql.hive.thriftServer.sin...

2016-12-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16392
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70563/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16392: [SPARK-18992] [SQL] Move spark.sql.hive.thriftServer.sin...

2016-12-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16392
  
**[Test build #70563 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70563/testReport)**
 for PR 16392 at commit 
[`5221494`](https://github.com/apache/spark/commit/52214945fdc7705f65ce6522c2d05b1a79e69c78).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16383: [SPARK-18980][SQL] implement Aggregator with Type...

2016-12-23 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16383#discussion_r93812875
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala
 ---
@@ -143,15 +197,96 @@ case class TypedAggregateExpression(
 }
   }
 
-  override def toString: String = {
-val input = inputDeserializer match {
-  case Some(UnresolvedDeserializer(deserializer, _)) => 
deserializer.dataType.simpleString
-  case Some(deserializer) => deserializer.dataType.simpleString
-  case _ => "unknown"
+  override def withInputInfo(
+  deser: Expression,
+  cls: Class[_],
+  schema: StructType): TypedAggregateExpression = {
+copy(inputDeserializer = Some(deser), inputClass = Some(cls), 
inputSchema = Some(schema))
+  }
+}
+
+case class ComplexTypedAggregateExpression(
+aggregator: Aggregator[Any, Any, Any],
+inputDeserializer: Option[Expression],
+inputClass: Option[Class[_]],
+inputSchema: Option[StructType],
+bufferSerializer: Seq[NamedExpression],
+bufferDeserializer: Expression,
+outputSerializer: Seq[Expression],
+dataType: DataType,
+nullable: Boolean,
+mutableAggBufferOffset: Int = 0,
+inputAggBufferOffset: Int = 0)
+  extends TypedImperativeAggregate[Any] with TypedAggregateExpression with 
NonSQLExpression {
+
+  override def deterministic: Boolean = true
+
+  override def children: Seq[Expression] = inputDeserializer.toSeq
+
+  override lazy val resolved: Boolean = inputDeserializer.isDefined && 
childrenResolved
+
+  override def references: AttributeSet = 
AttributeSet(inputDeserializer.toSeq)
+
+  override def createAggregationBuffer(): Any = aggregator.zero
+
+  private lazy val inputRowToObj = 
GenerateSafeProjection.generate(inputDeserializer.get :: Nil)
+
+  override def update(buffer: Any, input: InternalRow): Any = {
+val inputObj = inputRowToObj(input).get(0, ObjectType(classOf[Any]))
+if (inputObj != null) {
+  aggregator.reduce(buffer, inputObj)
+} else {
+  buffer
+}
+  }
+
+  override def merge(buffer: Any, input: Any): Any = {
+aggregator.merge(buffer, input)
+  }
+
+  private lazy val resultObjToRow = dataType match {
+case _: StructType =>
+  UnsafeProjection.create(CreateStruct(outputSerializer))
+case _ =>
+  assert(outputSerializer.length == 1)
+  UnsafeProjection.create(outputSerializer.head)
+  }
+
+  override def eval(buffer: Any): Any = {
+val resultObj = aggregator.finish(buffer)
+if (resultObj == null) {
+  null
+} else {
+  resultObjToRow(InternalRow(resultObj)).get(0, dataType)
 }
+  }
 
-s"$nodeName($input)"
+  private lazy val bufferObjToRow = 
UnsafeProjection.create(bufferSerializer)
+
+  override def serialize(buffer: Any): Array[Byte] = {
+bufferObjToRow(InternalRow(buffer)).getBytes
   }
 
-  override def nodeName: String = 
aggregator.getClass.getSimpleName.stripSuffix("$")
+  private lazy val bufferRow = new UnsafeRow(bufferSerializer.length)
+  private lazy val bufferRowToObject = 
GenerateSafeProjection.generate(bufferDeserializer :: Nil)
+
+  override def deserialize(storageFormat: Array[Byte]): Any = {
+bufferRow.pointTo(storageFormat, storageFormat.length)
+bufferRowToObject(bufferRow).get(0, ObjectType(classOf[Any]))
+  }
+
+  override def withNewMutableAggBufferOffset(
+  newMutableAggBufferOffset: Int): ComplexTypedAggregateExpression =
+copy(mutableAggBufferOffset = newMutableAggBufferOffset)
+
+  override def withNewInputAggBufferOffset(
+  newInputAggBufferOffset: Int): ComplexTypedAggregateExpression =
+copy(inputAggBufferOffset = newInputAggBufferOffset)
+
+  override def withInputInfo(
--- End diff --

This looks the same as `SimpleTypedAggregateExpression.withInputInfo`. As 
the returned type is `TypedAggregateExpression`. Can we just only implement it 
in `TypedAggregateExpression`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16383: [SPARK-18980][SQL] implement Aggregator with Type...

2016-12-23 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16383#discussion_r93812727
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala
 ---
@@ -143,15 +197,96 @@ case class TypedAggregateExpression(
 }
   }
 
-  override def toString: String = {
-val input = inputDeserializer match {
-  case Some(UnresolvedDeserializer(deserializer, _)) => 
deserializer.dataType.simpleString
-  case Some(deserializer) => deserializer.dataType.simpleString
-  case _ => "unknown"
+  override def withInputInfo(
+  deser: Expression,
+  cls: Class[_],
+  schema: StructType): TypedAggregateExpression = {
+copy(inputDeserializer = Some(deser), inputClass = Some(cls), 
inputSchema = Some(schema))
+  }
+}
+
+case class ComplexTypedAggregateExpression(
+aggregator: Aggregator[Any, Any, Any],
+inputDeserializer: Option[Expression],
+inputClass: Option[Class[_]],
+inputSchema: Option[StructType],
+bufferSerializer: Seq[NamedExpression],
+bufferDeserializer: Expression,
+outputSerializer: Seq[Expression],
+dataType: DataType,
+nullable: Boolean,
+mutableAggBufferOffset: Int = 0,
+inputAggBufferOffset: Int = 0)
+  extends TypedImperativeAggregate[Any] with TypedAggregateExpression with 
NonSQLExpression {
+
+  override def deterministic: Boolean = true
--- End diff --

I have a question about the `deterministic` here. Actually how the data is 
processed is delegated to `Aggregator` . I think it can be easy to output 
non-deterministic result by an `Aggregator`, e.g., a `Aggregator` which uses 
`ArrayBuffer` as buffer.

Do you think we should let `Aggregator` to decide if it is a deterministic 
expression or not?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-12-23 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14452
  
@cloud-fan Thanks for comment. I agreed that. I will prepare a design doc 
soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16395: [SPARK-17075][SQL][WIP] implemented filter estimation

2016-12-23 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16395
  
cc @srinathshankar 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16308: [SPARK-18936][SQL] Infrastructure for session local time...

2016-12-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16308
  
**[Test build #70565 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70565/testReport)**
 for PR 16308 at commit 
[`7917d0f`](https://github.com/apache/spark/commit/7917d0f100e47e8e3cce8440a38c3b16ef74732d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-12-23 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14452
  
After reading the PR description, I think this improvement is quite complex 
and worth a design doc. We should explain how the subquery execution works now 
and how you are going to change it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16323: [SPARK-18911] [SQL] Define CatalogStatistics to interact...

2016-12-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16323
  
**[Test build #70564 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70564/testReport)**
 for PR 16323 at commit 
[`978bb11`](https://github.com/apache/spark/commit/978bb11d2bdbbf099473525b2baf714154640890).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16323: [SPARK-18911] [SQL] Define CatalogStatistics to interact...

2016-12-23 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16323
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16233: [SPARK-18801][SQL] Add `View` operator to help resolve a...

2016-12-23 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16233
  
I am also in favor of the last option, @yhuai shall we go with it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16392: [SPARK-18992] [SQL] Move spark.sql.hive.thriftServer.sin...

2016-12-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16392
  
**[Test build #70563 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70563/testReport)**
 for PR 16392 at commit 
[`5221494`](https://github.com/apache/spark/commit/52214945fdc7705f65ce6522c2d05b1a79e69c78).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16323: [SPARK-18911] [SQL] Define CatalogStatistics to interact...

2016-12-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16323
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70561/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16323: [SPARK-18911] [SQL] Define CatalogStatistics to interact...

2016-12-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16323
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16323: [SPARK-18911] [SQL] Define CatalogStatistics to interact...

2016-12-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16323
  
**[Test build #70561 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70561/testReport)**
 for PR 16323 at commit 
[`978bb11`](https://github.com/apache/spark/commit/978bb11d2bdbbf099473525b2baf714154640890).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16392: [SPARK-18992] [SQL] Move spark.sql.hive.thriftServer.sin...

2016-12-23 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16392
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16395: [SPARK-17075][SQL][WIP] implemented filter estimation

2016-12-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16395
  
**[Test build #70562 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70562/testReport)**
 for PR 16395 at commit 
[`56d1579`](https://github.com/apache/spark/commit/56d15790bd1bc5e5f7224212d3422814a2e1cfae).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16395: [SPARK-17075][SQL][WIP] implemented filter estimation

2016-12-23 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16395
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-23 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13909
  
LGTM, I left some comments about code style, thanks for working on it!

BTW, can you update the benchmark result? I think we have finalized the 
solution now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16392: [SPARK-18992] [SQL] Move spark.sql.hive.thriftServer.sin...

2016-12-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16392
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70560/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16392: [SPARK-18992] [SQL] Move spark.sql.hive.thriftServer.sin...

2016-12-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16392
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16392: [SPARK-18992] [SQL] Move spark.sql.hive.thriftServer.sin...

2016-12-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16392
  
**[Test build #70560 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70560/testReport)**
 for PR 16392 at commit 
[`5221494`](https://github.com/apache/spark/commit/52214945fdc7705f65ce6522c2d05b1a79e69c78).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93812301
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -133,49 +183,72 @@ case class CreateMap(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
 val mapClass = classOf[ArrayBasedMapData].getName
 val keyArray = ctx.freshName("keyArray")
 val valueArray = ctx.freshName("valueArray")
-ctx.addMutableState("Object[]", keyArray, s"this.$keyArray = null;")
-ctx.addMutableState("Object[]", valueArray, s"this.$valueArray = 
null;")
 
-val keyData = s"new $arrayClass($keyArray)"
-val valueData = s"new $arrayClass($valueArray)"
-ev.copy(code = s"""
-  $keyArray = new Object[${keys.size}];
-  $valueArray = new Object[${values.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-keys.zipWithIndex.map { case (key, i) =>
-  val eval = key.genCode(ctx)
-  s"""
-${eval.code}
-if (${eval.isNull}) {
-  throw new RuntimeException("Cannot use null as map key!");
-} else {
-  $keyArray[$i] = ${eval.value};
-}
-  """
-}) +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-values.zipWithIndex.map { case (value, i) =>
-  val eval = value.genCode(ctx)
-  s"""
-${eval.code}
-if (${eval.isNull}) {
-  $valueArray[$i] = null;
-} else {
-  $valueArray[$i] = ${eval.value};
-}
-  """
-}) +
-  s"""
-final MapData ${ev.value} = new $mapClass($keyData, $valueData);
-this.$keyArray = null;
-this.$valueArray = null;
-  """, isNull = "false")
+val MapType(keyDt, valueDt, _) = dataType
+val evalKeys = keys.map(e => e.genCode(ctx))
+val isPrimitiveArrayKey = ctx.isPrimitiveType(keyDt)
+val evalValues = values.map(e => e.genCode(ctx))
+val isPrimitiveArrayValue = ctx.isPrimitiveType(valueDt)
+val (preprocessKeyData, keyDataArray) =
+  GenArrayData.getCodeArrayData(ctx, keyDt, keys.size, 
isPrimitiveArrayKey, keyArray)
+val (preprocessValueData, valueDataArray) =
+  GenArrayData.getCodeArrayData(ctx, valueDt, values.size, 
isPrimitiveArrayValue, valueArray)
+
+val assignKeys = if (isPrimitiveArrayKey) {
+  val primitiveKeyTypeName = ctx.primitiveTypeName(keyDt)
+  evalKeys.zipWithIndex.map { case (eval, i) =>
+eval.code + s"""
+ if (${eval.isNull}) {
+   $keyDataArray.setNullAt($i);
+ } else {
+   $keyDataArray.set$primitiveKeyTypeName($i, ${eval.value});
+ }
+   """
+  }
+} else {
+  evalKeys.zipWithIndex.map { case (eval, i) =>
+eval.code + s"""
+ if (${eval.isNull}) {
+   throw new RuntimeException("Cannot use null as map key!");
+ } else {
+   $keyArray[$i] = ${eval.value};
+ }
+   """
+  }
+}
+
+val assignValues = if (isPrimitiveArrayValue) {
+  val primitiveValueTypeName = ctx.primitiveTypeName(valueDt)
+  evalValues.zipWithIndex.map { case (eval, i) =>
+eval.code + s"""
+ if (${eval.isNull}) {
+   $valueDataArray.setNullAt($i);
+ } else {
+   $valueDataArray.set$primitiveValueTypeName($i, ${eval.value});
+ }
+   """
+  }
+} else {
+  evalValues.zipWithIndex.map { case (eval, i) =>
+eval.code + s"""
+ if (${eval.isNull}) {
+   $valueArray[$i] = null;
+ } else {
+   $valueArray[$i] = ${eval.value};
+ }
+   """
+  }
+}
+
+ev.copy(code = s"final boolean ${ev.isNull} = false;" +
--- End diff --

nit:
```
val code =
  s"""
final boolean ${ev.isNull} = false;
$preprocessKeyData
${ctx.splitExpressions(ctx.INPUT_ROW, assignKeys)}
...
final MapData ${ev.value} = new $mapClass($keyDataArray, 
$valueDataArray);
  """
ev.copy(code = code)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93812285
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -133,49 +183,72 @@ case class CreateMap(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
 val mapClass = classOf[ArrayBasedMapData].getName
 val keyArray = ctx.freshName("keyArray")
 val valueArray = ctx.freshName("valueArray")
-ctx.addMutableState("Object[]", keyArray, s"this.$keyArray = null;")
-ctx.addMutableState("Object[]", valueArray, s"this.$valueArray = 
null;")
 
-val keyData = s"new $arrayClass($keyArray)"
-val valueData = s"new $arrayClass($valueArray)"
-ev.copy(code = s"""
-  $keyArray = new Object[${keys.size}];
-  $valueArray = new Object[${values.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-keys.zipWithIndex.map { case (key, i) =>
-  val eval = key.genCode(ctx)
-  s"""
-${eval.code}
-if (${eval.isNull}) {
-  throw new RuntimeException("Cannot use null as map key!");
-} else {
-  $keyArray[$i] = ${eval.value};
-}
-  """
-}) +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-values.zipWithIndex.map { case (value, i) =>
-  val eval = value.genCode(ctx)
-  s"""
-${eval.code}
-if (${eval.isNull}) {
-  $valueArray[$i] = null;
-} else {
-  $valueArray[$i] = ${eval.value};
-}
-  """
-}) +
-  s"""
-final MapData ${ev.value} = new $mapClass($keyData, $valueData);
-this.$keyArray = null;
-this.$valueArray = null;
-  """, isNull = "false")
+val MapType(keyDt, valueDt, _) = dataType
+val evalKeys = keys.map(e => e.genCode(ctx))
+val isPrimitiveArrayKey = ctx.isPrimitiveType(keyDt)
+val evalValues = values.map(e => e.genCode(ctx))
+val isPrimitiveArrayValue = ctx.isPrimitiveType(valueDt)
+val (preprocessKeyData, keyDataArray) =
+  GenArrayData.getCodeArrayData(ctx, keyDt, keys.size, 
isPrimitiveArrayKey, keyArray)
+val (preprocessValueData, valueDataArray) =
+  GenArrayData.getCodeArrayData(ctx, valueDt, values.size, 
isPrimitiveArrayValue, valueArray)
+
+val assignKeys = if (isPrimitiveArrayKey) {
+  val primitiveKeyTypeName = ctx.primitiveTypeName(keyDt)
+  evalKeys.zipWithIndex.map { case (eval, i) =>
+eval.code + s"""
+ if (${eval.isNull}) {
+   $keyDataArray.setNullAt($i);
+ } else {
+   $keyDataArray.set$primitiveKeyTypeName($i, ${eval.value});
+ }
+   """
+  }
+} else {
+  evalKeys.zipWithIndex.map { case (eval, i) =>
+eval.code + s"""
+ if (${eval.isNull}) {
+   throw new RuntimeException("Cannot use null as map key!");
+ } else {
+   $keyArray[$i] = ${eval.value};
+ }
+   """
+  }
+}
+
+val assignValues = if (isPrimitiveArrayValue) {
--- End diff --

seems the assignment code can also be put in the util function, we need to 
pass more parameters though, and a `allowNull: Boolean` flag


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93812270
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -133,49 +183,72 @@ case class CreateMap(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
 val mapClass = classOf[ArrayBasedMapData].getName
 val keyArray = ctx.freshName("keyArray")
 val valueArray = ctx.freshName("valueArray")
-ctx.addMutableState("Object[]", keyArray, s"this.$keyArray = null;")
-ctx.addMutableState("Object[]", valueArray, s"this.$valueArray = 
null;")
 
-val keyData = s"new $arrayClass($keyArray)"
-val valueData = s"new $arrayClass($valueArray)"
-ev.copy(code = s"""
-  $keyArray = new Object[${keys.size}];
-  $valueArray = new Object[${values.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-keys.zipWithIndex.map { case (key, i) =>
-  val eval = key.genCode(ctx)
-  s"""
-${eval.code}
-if (${eval.isNull}) {
-  throw new RuntimeException("Cannot use null as map key!");
-} else {
-  $keyArray[$i] = ${eval.value};
-}
-  """
-}) +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-values.zipWithIndex.map { case (value, i) =>
-  val eval = value.genCode(ctx)
-  s"""
-${eval.code}
-if (${eval.isNull}) {
-  $valueArray[$i] = null;
-} else {
-  $valueArray[$i] = ${eval.value};
-}
-  """
-}) +
-  s"""
-final MapData ${ev.value} = new $mapClass($keyData, $valueData);
-this.$keyArray = null;
-this.$valueArray = null;
-  """, isNull = "false")
+val MapType(keyDt, valueDt, _) = dataType
+val evalKeys = keys.map(e => e.genCode(ctx))
+val isPrimitiveArrayKey = ctx.isPrimitiveType(keyDt)
+val evalValues = values.map(e => e.genCode(ctx))
+val isPrimitiveArrayValue = ctx.isPrimitiveType(valueDt)
+val (preprocessKeyData, keyDataArray) =
+  GenArrayData.getCodeArrayData(ctx, keyDt, keys.size, 
isPrimitiveArrayKey, keyArray)
+val (preprocessValueData, valueDataArray) =
+  GenArrayData.getCodeArrayData(ctx, valueDt, values.size, 
isPrimitiveArrayValue, valueArray)
+
+val assignKeys = if (isPrimitiveArrayKey) {
+  val primitiveKeyTypeName = ctx.primitiveTypeName(keyDt)
+  evalKeys.zipWithIndex.map { case (eval, i) =>
+eval.code + s"""
+ if (${eval.isNull}) {
+   $keyDataArray.setNullAt($i);
--- End diff --

shall we throw exception here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93812259
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -133,49 +183,72 @@ case class CreateMap(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
 val mapClass = classOf[ArrayBasedMapData].getName
 val keyArray = ctx.freshName("keyArray")
 val valueArray = ctx.freshName("valueArray")
-ctx.addMutableState("Object[]", keyArray, s"this.$keyArray = null;")
-ctx.addMutableState("Object[]", valueArray, s"this.$valueArray = 
null;")
 
-val keyData = s"new $arrayClass($keyArray)"
-val valueData = s"new $arrayClass($valueArray)"
-ev.copy(code = s"""
-  $keyArray = new Object[${keys.size}];
-  $valueArray = new Object[${values.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-keys.zipWithIndex.map { case (key, i) =>
-  val eval = key.genCode(ctx)
-  s"""
-${eval.code}
-if (${eval.isNull}) {
-  throw new RuntimeException("Cannot use null as map key!");
-} else {
-  $keyArray[$i] = ${eval.value};
-}
-  """
-}) +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-values.zipWithIndex.map { case (value, i) =>
-  val eval = value.genCode(ctx)
-  s"""
-${eval.code}
-if (${eval.isNull}) {
-  $valueArray[$i] = null;
-} else {
-  $valueArray[$i] = ${eval.value};
-}
-  """
-}) +
-  s"""
-final MapData ${ev.value} = new $mapClass($keyData, $valueData);
-this.$keyArray = null;
-this.$valueArray = null;
-  """, isNull = "false")
+val MapType(keyDt, valueDt, _) = dataType
+val evalKeys = keys.map(e => e.genCode(ctx))
+val isPrimitiveArrayKey = ctx.isPrimitiveType(keyDt)
--- End diff --

`isPrimitiveArrayKey` looks weird, do you mean `isPrimitiveKey`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16395: [SPARK-17075][SQL][WIP] implemented filter estimation

2016-12-23 Thread ron8hu

Github user ron8hu commented on the issue:

https://github.com/apache/spark/pull/16395
  
cc @wzhfy @rxin @hvanhovell @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93812253
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -56,33 +58,81 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
-val values = ctx.freshName("values")
-ctx.addMutableState("Object[]", values, s"this.$values = null;")
-
-ev.copy(code = s"""
-  this.$values = new Object[${children.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-children.zipWithIndex.map { case (e, i) =>
-  val eval = e.genCode(ctx)
-  eval.code + s"""
-if (${eval.isNull}) {
-  $values[$i] = null;
-} else {
-  $values[$i] = ${eval.value};
-}
-   """
-}) +
-  s"""
-final ArrayData ${ev.value} = new $arrayClass($values);
-this.$values = null;
-  """, isNull = "false")
+val array = ctx.freshName("array")
+
+val et = dataType.elementType
+val evals = children.map(e => e.genCode(ctx))
+val isPrimitiveArray = ctx.isPrimitiveType(et)
+val (preprocess, arrayData) =
+  GenArrayData.getCodeArrayData(ctx, et, children.size, 
isPrimitiveArray, array)
+
+val assigns = if (isPrimitiveArray) {
+  val primitiveTypeName = ctx.primitiveTypeName(et)
+  evals.zipWithIndex.map { case (eval, i) =>
+eval.code + s"""
+ if (${eval.isNull}) {
+   $arrayData.setNullAt($i);
+ } else {
+   $arrayData.set$primitiveTypeName($i, ${eval.value});
+ }
+   """
+  }
+} else {
+  evals.zipWithIndex.map { case (eval, i) =>
+eval.code + s"""
+ if (${eval.isNull}) {
+   $array[$i] = null;
+ } else {
+   $array[$i] = ${eval.value};
+ }
+   """
+  }
+}
+ev.copy(code =
+  preprocess +
+  ctx.splitExpressions(ctx.INPUT_ROW, assigns) +
+  s"\nfinal ArrayData ${ev.value} = $arrayData;\n",
+  isNull = "false")
   }
 
   override def prettyName: String = "array"
 }
 
+private [sql] object GenArrayData {
+  // This function returns Java code pieces based on DataType and 
isPrimitive
+  // for allocation of ArrayData class
+  def getCodeArrayData(
+  ctx: CodegenContext,
+  dt: DataType,
+  size: Int,
+  isPrimitive : Boolean,
+  array: String): (String, String) = {
+if (!isPrimitive) {
+  val arrayClass = classOf[GenericArrayData].getName
+  ctx.addMutableState("Object[]", array,
+s"this.$array = new Object[${size}];")
+  ("", s"new $arrayClass($array)")
+} else {
+  val baseArray = ctx.freshName("baseArray")
--- End diff --

this is counter-intuitive, the `array` parameter should be the name of the 
underlying array, but here you create another name for the underlying array, 
and use `array` as the name of the result `ArrayData`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93812234
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -56,33 +58,81 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
-val values = ctx.freshName("values")
-ctx.addMutableState("Object[]", values, s"this.$values = null;")
-
-ev.copy(code = s"""
-  this.$values = new Object[${children.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-children.zipWithIndex.map { case (e, i) =>
-  val eval = e.genCode(ctx)
-  eval.code + s"""
-if (${eval.isNull}) {
-  $values[$i] = null;
-} else {
-  $values[$i] = ${eval.value};
-}
-   """
-}) +
-  s"""
-final ArrayData ${ev.value} = new $arrayClass($values);
-this.$values = null;
-  """, isNull = "false")
+val array = ctx.freshName("array")
+
+val et = dataType.elementType
+val evals = children.map(e => e.genCode(ctx))
+val isPrimitiveArray = ctx.isPrimitiveType(et)
+val (preprocess, arrayData) =
+  GenArrayData.getCodeArrayData(ctx, et, children.size, 
isPrimitiveArray, array)
+
+val assigns = if (isPrimitiveArray) {
+  val primitiveTypeName = ctx.primitiveTypeName(et)
+  evals.zipWithIndex.map { case (eval, i) =>
+eval.code + s"""
+ if (${eval.isNull}) {
+   $arrayData.setNullAt($i);
+ } else {
+   $arrayData.set$primitiveTypeName($i, ${eval.value});
+ }
+   """
+  }
+} else {
+  evals.zipWithIndex.map { case (eval, i) =>
+eval.code + s"""
+ if (${eval.isNull}) {
+   $array[$i] = null;
+ } else {
+   $array[$i] = ${eval.value};
+ }
+   """
+  }
+}
+ev.copy(code =
+  preprocess +
+  ctx.splitExpressions(ctx.INPUT_ROW, assigns) +
+  s"\nfinal ArrayData ${ev.value} = $arrayData;\n",
+  isNull = "false")
   }
 
   override def prettyName: String = "array"
 }
 
+private [sql] object GenArrayData {
+  // This function returns Java code pieces based on DataType and 
isPrimitive
+  // for allocation of ArrayData class
--- End diff --

turn this into java doc, and add some description for parameters and return 
type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93812230
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -56,33 +58,81 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
-val values = ctx.freshName("values")
-ctx.addMutableState("Object[]", values, s"this.$values = null;")
-
-ev.copy(code = s"""
-  this.$values = new Object[${children.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-children.zipWithIndex.map { case (e, i) =>
-  val eval = e.genCode(ctx)
-  eval.code + s"""
-if (${eval.isNull}) {
-  $values[$i] = null;
-} else {
-  $values[$i] = ${eval.value};
-}
-   """
-}) +
-  s"""
-final ArrayData ${ev.value} = new $arrayClass($values);
-this.$values = null;
-  """, isNull = "false")
+val array = ctx.freshName("array")
+
+val et = dataType.elementType
+val evals = children.map(e => e.genCode(ctx))
+val isPrimitiveArray = ctx.isPrimitiveType(et)
+val (preprocess, arrayData) =
+  GenArrayData.getCodeArrayData(ctx, et, children.size, 
isPrimitiveArray, array)
+
+val assigns = if (isPrimitiveArray) {
+  val primitiveTypeName = ctx.primitiveTypeName(et)
+  evals.zipWithIndex.map { case (eval, i) =>
+eval.code + s"""
+ if (${eval.isNull}) {
+   $arrayData.setNullAt($i);
+ } else {
+   $arrayData.set$primitiveTypeName($i, ${eval.value});
+ }
+   """
+  }
+} else {
+  evals.zipWithIndex.map { case (eval, i) =>
+eval.code + s"""
+ if (${eval.isNull}) {
+   $array[$i] = null;
+ } else {
+   $array[$i] = ${eval.value};
+ }
+   """
+  }
+}
+ev.copy(code =
+  preprocess +
+  ctx.splitExpressions(ctx.INPUT_ROW, assigns) +
+  s"\nfinal ArrayData ${ev.value} = $arrayData;\n",
+  isNull = "false")
   }
 
   override def prettyName: String = "array"
 }
 
+private [sql] object GenArrayData {
+  // This function returns Java code pieces based on DataType and 
isPrimitive
+  // for allocation of ArrayData class
+  def getCodeArrayData(
+  ctx: CodegenContext,
+  dt: DataType,
+  size: Int,
+  isPrimitive : Boolean,
+  array: String): (String, String) = {
--- End diff --

nit: `arrayName`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93812219
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -56,33 +58,81 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
-val values = ctx.freshName("values")
-ctx.addMutableState("Object[]", values, s"this.$values = null;")
-
-ev.copy(code = s"""
-  this.$values = new Object[${children.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-children.zipWithIndex.map { case (e, i) =>
-  val eval = e.genCode(ctx)
-  eval.code + s"""
-if (${eval.isNull}) {
-  $values[$i] = null;
-} else {
-  $values[$i] = ${eval.value};
-}
-   """
-}) +
-  s"""
-final ArrayData ${ev.value} = new $arrayClass($values);
-this.$values = null;
-  """, isNull = "false")
+val array = ctx.freshName("array")
+
+val et = dataType.elementType
+val evals = children.map(e => e.genCode(ctx))
+val isPrimitiveArray = ctx.isPrimitiveType(et)
+val (preprocess, arrayData) =
+  GenArrayData.getCodeArrayData(ctx, et, children.size, 
isPrimitiveArray, array)
+
+val assigns = if (isPrimitiveArray) {
+  val primitiveTypeName = ctx.primitiveTypeName(et)
+  evals.zipWithIndex.map { case (eval, i) =>
+eval.code + s"""
+ if (${eval.isNull}) {
+   $arrayData.setNullAt($i);
+ } else {
+   $arrayData.set$primitiveTypeName($i, ${eval.value});
+ }
+   """
+  }
+} else {
+  evals.zipWithIndex.map { case (eval, i) =>
+eval.code + s"""
+ if (${eval.isNull}) {
+   $array[$i] = null;
+ } else {
+   $array[$i] = ${eval.value};
+ }
+   """
+  }
+}
+ev.copy(code =
+  preprocess +
+  ctx.splitExpressions(ctx.INPUT_ROW, assigns) +
+  s"\nfinal ArrayData ${ev.value} = $arrayData;\n",
+  isNull = "false")
   }
 
   override def prettyName: String = "array"
 }
 
+private [sql] object GenArrayData {
+  // This function returns Java code pieces based on DataType and 
isPrimitive
+  // for allocation of ArrayData class
+  def getCodeArrayData(
+  ctx: CodegenContext,
+  dt: DataType,
--- End diff --

nit: `elementType`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93812212
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -56,33 +58,81 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
-val values = ctx.freshName("values")
-ctx.addMutableState("Object[]", values, s"this.$values = null;")
-
-ev.copy(code = s"""
-  this.$values = new Object[${children.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-children.zipWithIndex.map { case (e, i) =>
-  val eval = e.genCode(ctx)
-  eval.code + s"""
-if (${eval.isNull}) {
-  $values[$i] = null;
-} else {
-  $values[$i] = ${eval.value};
-}
-   """
-}) +
-  s"""
-final ArrayData ${ev.value} = new $arrayClass($values);
-this.$values = null;
-  """, isNull = "false")
+val array = ctx.freshName("array")
+
+val et = dataType.elementType
+val evals = children.map(e => e.genCode(ctx))
+val isPrimitiveArray = ctx.isPrimitiveType(et)
+val (preprocess, arrayData) =
+  GenArrayData.getCodeArrayData(ctx, et, children.size, 
isPrimitiveArray, array)
+
+val assigns = if (isPrimitiveArray) {
+  val primitiveTypeName = ctx.primitiveTypeName(et)
+  evals.zipWithIndex.map { case (eval, i) =>
+eval.code + s"""
+ if (${eval.isNull}) {
+   $arrayData.setNullAt($i);
+ } else {
+   $arrayData.set$primitiveTypeName($i, ${eval.value});
+ }
+   """
+  }
+} else {
+  evals.zipWithIndex.map { case (eval, i) =>
+eval.code + s"""
+ if (${eval.isNull}) {
+   $array[$i] = null;
+ } else {
+   $array[$i] = ${eval.value};
+ }
+   """
+  }
+}
+ev.copy(code =
+  preprocess +
+  ctx.splitExpressions(ctx.INPUT_ROW, assigns) +
+  s"\nfinal ArrayData ${ev.value} = $arrayData;\n",
+  isNull = "false")
   }
 
   override def prettyName: String = "array"
 }
 
+private [sql] object GenArrayData {
+  // This function returns Java code pieces based on DataType and 
isPrimitive
+  // for allocation of ArrayData class
+  def getCodeArrayData(
--- End diff --

nit: `genCodeToCreateArrayData`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93812217
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -56,33 +58,81 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
-val values = ctx.freshName("values")
-ctx.addMutableState("Object[]", values, s"this.$values = null;")
-
-ev.copy(code = s"""
-  this.$values = new Object[${children.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-children.zipWithIndex.map { case (e, i) =>
-  val eval = e.genCode(ctx)
-  eval.code + s"""
-if (${eval.isNull}) {
-  $values[$i] = null;
-} else {
-  $values[$i] = ${eval.value};
-}
-   """
-}) +
-  s"""
-final ArrayData ${ev.value} = new $arrayClass($values);
-this.$values = null;
-  """, isNull = "false")
+val array = ctx.freshName("array")
+
+val et = dataType.elementType
+val evals = children.map(e => e.genCode(ctx))
+val isPrimitiveArray = ctx.isPrimitiveType(et)
+val (preprocess, arrayData) =
+  GenArrayData.getCodeArrayData(ctx, et, children.size, 
isPrimitiveArray, array)
+
+val assigns = if (isPrimitiveArray) {
+  val primitiveTypeName = ctx.primitiveTypeName(et)
+  evals.zipWithIndex.map { case (eval, i) =>
+eval.code + s"""
+ if (${eval.isNull}) {
+   $arrayData.setNullAt($i);
+ } else {
+   $arrayData.set$primitiveTypeName($i, ${eval.value});
+ }
+   """
+  }
+} else {
+  evals.zipWithIndex.map { case (eval, i) =>
+eval.code + s"""
+ if (${eval.isNull}) {
+   $array[$i] = null;
+ } else {
+   $array[$i] = ${eval.value};
+ }
+   """
+  }
+}
+ev.copy(code =
+  preprocess +
+  ctx.splitExpressions(ctx.INPUT_ROW, assigns) +
+  s"\nfinal ArrayData ${ev.value} = $arrayData;\n",
+  isNull = "false")
   }
 
   override def prettyName: String = "array"
 }
 
+private [sql] object GenArrayData {
+  // This function returns Java code pieces based on DataType and 
isPrimitive
+  // for allocation of ArrayData class
+  def getCodeArrayData(
+  ctx: CodegenContext,
+  dt: DataType,
+  size: Int,
--- End diff --

nit: `numElements`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93812201
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -56,33 +58,81 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
-val values = ctx.freshName("values")
-ctx.addMutableState("Object[]", values, s"this.$values = null;")
-
-ev.copy(code = s"""
-  this.$values = new Object[${children.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-children.zipWithIndex.map { case (e, i) =>
-  val eval = e.genCode(ctx)
-  eval.code + s"""
-if (${eval.isNull}) {
-  $values[$i] = null;
-} else {
-  $values[$i] = ${eval.value};
-}
-   """
-}) +
-  s"""
-final ArrayData ${ev.value} = new $arrayClass($values);
-this.$values = null;
-  """, isNull = "false")
+val array = ctx.freshName("array")
--- End diff --

this is only used in the non-primitive branch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16395: implemented first version of filter estimation

2016-12-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16395
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93812197
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -56,33 +58,81 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
-val values = ctx.freshName("values")
-ctx.addMutableState("Object[]", values, s"this.$values = null;")
-
-ev.copy(code = s"""
-  this.$values = new Object[${children.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-children.zipWithIndex.map { case (e, i) =>
-  val eval = e.genCode(ctx)
-  eval.code + s"""
-if (${eval.isNull}) {
-  $values[$i] = null;
-} else {
-  $values[$i] = ${eval.value};
-}
-   """
-}) +
-  s"""
-final ArrayData ${ev.value} = new $arrayClass($values);
-this.$values = null;
-  """, isNull = "false")
+val array = ctx.freshName("array")
+
+val et = dataType.elementType
+val evals = children.map(e => e.genCode(ctx))
+val isPrimitiveArray = ctx.isPrimitiveType(et)
+val (preprocess, arrayData) =
+  GenArrayData.getCodeArrayData(ctx, et, children.size, 
isPrimitiveArray, array)
+
+val assigns = if (isPrimitiveArray) {
+  val primitiveTypeName = ctx.primitiveTypeName(et)
+  evals.zipWithIndex.map { case (eval, i) =>
+eval.code + s"""
+ if (${eval.isNull}) {
+   $arrayData.setNullAt($i);
+ } else {
+   $arrayData.set$primitiveTypeName($i, ${eval.value});
+ }
+   """
+  }
+} else {
+  evals.zipWithIndex.map { case (eval, i) =>
+eval.code + s"""
+ if (${eval.isNull}) {
+   $array[$i] = null;
+ } else {
+   $array[$i] = ${eval.value};
+ }
+   """
+  }
+}
+ev.copy(code =
--- End diff --

how about:
```
ev.copy(
  code = preprocess + ctx.splitExpressions(ctx.INPUT_ROW, assigns),
  value = arrayData,
  isNull = "false")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16395: implemented first version of filter estimation

2016-12-23 Thread ron8hu

GitHub user ron8hu opened a pull request:

https://github.com/apache/spark/pull/16395

implemented first version of filter estimation

## What changes were proposed in this pull request?

This is a WIP PR. In this version, we set up the framework to traverse 
predicate and evaluate the equality (=) expression.

## How was this patch tested?

We just have a simple test case for now. More tests need to be added.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ron8hu/spark filterSelectivity

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16395.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16395


commit 56d15790bd1bc5e5f7224212d3422814a2e1cfae
Author: Ron Hu 
Date:   2016-12-23T22:44:57Z

implemented first version of filter estimation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16389: [SPARK-18981][Core]The job hang problem when speculation...

2016-12-23 Thread zhaorongsheng

Github user zhaorongsheng commented on the issue:

https://github.com/apache/spark/pull/16389
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16391: [SPARK-18990][SQL] make DatasetBenchmark fairer f...

2016-12-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16391#discussion_r93812042
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala ---
@@ -133,24 +134,29 @@ object DatasetBenchmark {
   def aggregate(spark: SparkSession, numRows: Long): Benchmark = {
 import spark.implicits._
 
-val df = spark.range(1, numRows).select($"id".as("l"), 
$"id".cast(StringType).as("s"))
+val rdd = spark.sparkContext.range(0, numRows)
+val ds = spark.range(0, numRows)
+val df = ds.toDF("l")
+
 val benchmark = new Benchmark("aggregate", numRows)
 
-val rdd = spark.sparkContext.range(1, numRows).map(l => Data(l, 
l.toString))
 benchmark.addCase("RDD sum") { iter =>
-  rdd.aggregate(0L)(_ + _.l, _ + _)
+  rdd.map(l => (l % 10, l)).reduceByKey(_ + _).foreach(_ => Unit)
--- End diff --

aggregate without grouping is not a common use case


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16240: [SPARK-16792][SQL] Dataset containing a Case Clas...

2016-12-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16240#discussion_r93811936
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala 
---
@@ -100,31 +97,36 @@ abstract class SQLImplicits {
   // Seqs
 
   /** @since 1.6.1 */
-  implicit def newIntSeqEncoder: Encoder[Seq[Int]] = ExpressionEncoder()
+  implicit def newIntSeqEncoder[T <: Seq[Int] : TypeTag]: Encoder[T] = 
ExpressionEncoder()
 
   /** @since 1.6.1 */
-  implicit def newLongSeqEncoder: Encoder[Seq[Long]] = ExpressionEncoder()
+  implicit def newLongSeqEncoder[T <: Seq[Long] : TypeTag]: Encoder[T] = 
ExpressionEncoder()
 
   /** @since 1.6.1 */
-  implicit def newDoubleSeqEncoder: Encoder[Seq[Double]] = 
ExpressionEncoder()
+  implicit def newDoubleSeqEncoder[T <: Seq[Double] : TypeTag]: Encoder[T] 
= ExpressionEncoder()
 
   /** @since 1.6.1 */
-  implicit def newFloatSeqEncoder: Encoder[Seq[Float]] = 
ExpressionEncoder()
+  implicit def newFloatSeqEncoder[T <: Seq[Float] : TypeTag]: Encoder[T] = 
ExpressionEncoder()
 
   /** @since 1.6.1 */
-  implicit def newByteSeqEncoder: Encoder[Seq[Byte]] = ExpressionEncoder()
+  implicit def newByteSeqEncoder[T <: Seq[Byte] : TypeTag]: Encoder[T] = 
ExpressionEncoder()
 
   /** @since 1.6.1 */
-  implicit def newShortSeqEncoder: Encoder[Seq[Short]] = 
ExpressionEncoder()
+  implicit def newShortSeqEncoder[T <: Seq[Short] : TypeTag]: Encoder[T] = 
ExpressionEncoder()
 
   /** @since 1.6.1 */
-  implicit def newBooleanSeqEncoder: Encoder[Seq[Boolean]] = 
ExpressionEncoder()
+  implicit def newBooleanSeqEncoder[T <: Seq[Boolean] : TypeTag]: 
Encoder[T] = ExpressionEncoder()
 
   /** @since 1.6.1 */
-  implicit def newStringSeqEncoder: Encoder[Seq[String]] = 
ExpressionEncoder()
+  implicit def newStringSeqEncoder[T <: Seq[String] : TypeTag]: Encoder[T] 
= ExpressionEncoder()
 
   /** @since 1.6.1 */
-  implicit def newProductSeqEncoder[A <: Product : TypeTag]: 
Encoder[Seq[A]] = ExpressionEncoder()
+  implicit def newProductSeqEncoder[A <: Product, T <: Seq[A] : TypeTag]: 
Encoder[T] =
--- End diff --

can we just use `newProductSeqEncoder[T <: Seq[Product] : TypeTag]: 
Encoder[T]` here? Then we don't need to workaround implicit


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16393: [SPARK-18993] [Build] Revert Split test-tags into test-J...

2016-12-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16393
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16393: [SPARK-18993] [Build] Revert Split test-tags into test-J...

2016-12-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16393
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70559/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...

2016-12-23 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r93811889
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
 ---
@@ -288,7 +289,7 @@ object FileFormatWriter extends Logging {
 val escaped = ScalaUDF(
   ExternalCatalogUtils.escapePathName _,
   StringType,
-  Seq(Cast(c, StringType)),
+  Seq(Cast(c, StringType, DateTimeUtils.defaultTimeZone().getID)),
--- End diff --

This seems related to partition values.
We'll send follow-up prs about partition values 
([SPARK-18939](https://issues.apache.org/jira/browse/SPARK-18939)), let's 
discuss there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16393: [SPARK-18993] [Build] Revert Split test-tags into test-J...

2016-12-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16393
  
**[Test build #70559 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70559/testReport)**
 for PR 16393 at commit 
[`685f28b`](https://github.com/apache/spark/commit/685f28bb7c42148144fe9b8a57c9b29af8dc0e90).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN s...

2016-12-23 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16337#discussion_r93811860
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-group-by.sql
 ---
@@ -0,0 +1,117 @@
+-- A test suite for GROUP BY in parent side, subquery, and both predicate 
subquery
+-- It includes correlated cases.
+
+-- tables and data types
+
+CREATE DATABASE indb;
--- End diff --

You created a database, but you did not use it. 

Maybe you do not need it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...

2016-12-23 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r93811817
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
@@ -146,18 +146,20 @@ class QueryExecution(val sparkSession: SparkSession, 
val logical: LogicalPlan) {
 
 /** Implementation following Hive's TimestampWritable.toString */
 def formatTimestamp(timestamp: Timestamp): String = {
+  val timestampFormat =
+
DateTimeUtils.getThreadLocalTimestampFormat(DateTimeUtils.defaultTimeZone())
--- End diff --

Good catch. I'll modify it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16389: [SPARK-18981][Core]The job hang problem when speculation...

2016-12-23 Thread zhaorongsheng

Github user zhaorongsheng commented on the issue:

https://github.com/apache/spark/pull/16389
  
Hi @mridulm . I have modified the tests. Please check it. 
Thanks~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN s...

2016-12-23 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16337#discussion_r93811755
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-group-by.sql
 ---
@@ -0,0 +1,117 @@
+-- A test suite for GROUP BY in parent side, subquery, and both predicate 
subquery
+-- It includes correlated cases.
+
+-- tables and data types
+
+CREATE DATABASE indb;
+CREATE TABLE t1(t1a String, t1b Short, t1c Int, t1d Long, t1e float, t1f 
double, t1g DECIMAL, t1h TIMESTAMP, t1i Date)
+using parquet;
+CREATE TABLE t2(t2a String, t2b Short, t2c Int, t2d Long, t2e float, t2f 
double, t2g DECIMAL, t2h TIMESTAMP, t2i Date)
+using parquet;
+CREATE TABLE t3(t3a String, t3b Short, t3c Int, t3d Long, t3e float, t3f 
double, t3g DECIMAL, t3h TIMESTAMP, t3i Date)
+using parquet;
+
+-- insert to tables
+INSERT INTO t1 VALUES
+ ('t1a', 6, 8, 10, 15, 20, 20.00, timestamp(date("2014-04-04")), 
date("2014-04-04")),
+ ('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1a', 16, 12, 21, 15, 20, 20.00, timestamp(date("2014-06-04")), 
date("2014-06-04")),
+ ('t1a', 16, 12, 10, 15, 20, 20.00, timestamp(date("2014-07-04")), 
date("2014-07-04")),
+ ('t1c', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-05")),
+ ('t1d', null, 16, 22, 17, 25, 26.00, timestamp(date("2014-06-04")), null),
+ ('t1d', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), null),
+ ('t1e', 10, null, 25, 17, 25, 26.00, timestamp(date("2014-08-04")), 
date("2014-08-04")),
+ ('t1e', 10, null, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), 
date("2014-09-04")),
+ ('t1d', 10, null, 12, 17, 25, 26.00, timestamp(date("2015-05-04")), 
date("2015-05-04")),
+ ('t1a', 6, 8, 10, 15, 20, 20.00, timestamp(date("2014-04-04")), 
date("2014-04-04")),
+ ('t1e', 10, null, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-0=4"));
+
+INSERT INTO t2 VALUES
+ ('t2a', 6, 12, 14, 15, 20, 20.00, timestamp(date("2014-04-04")), 
date("2014-04-04")),
+ ('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 8, 16, 119, 17, 25, 26.00, timestamp(date("2015-05-04")), 
date("2015-05-04")),
+ ('t1c', 12, 16, 219, 17, 25, 26.00, timestamp(date("2016-05-04")), 
date("2016-05-04")),
+ ('t1b', null, 16, 319, 17, 25, 26.00, timestamp(date("2017-05-04")), 
null),
+ ('t2e', 8, null, 419, 17, 25, 26.00, timestamp(date("2014-06-04")), 
date("2014-06-04")),
+ ('t1f', 19, null, 519, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-06-04")), 
date("2014-06-04")),
+ ('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), 
date("2014-07-04")),
+ ('t1c', 12, 16, 19, 17, 25, 26.00, timestamp(date("2014-08-04")), 
date("2014-08-05")),
+ ('t1e', 8, null, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), 
date("2014-09-04")),
+ ('t1f', 19, null, 19, 17, 25, 26.00, timestamp(date("2014-10-04")), 
date("2014-10-04")),
+ ('t1b', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), null);
+
+INSERT INTO t3 VALUES
+ ('t3a', 6, 12, 110, 15, 20, 20.00, timestamp(date("2014-04-04")), 
date("2014-04-04")),
+ ('t3a', 6, 12, 10, 15, 20, 20.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 10, 12, 219, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 8, 16, 319, 17, 25, 26.00, timestamp(date("2014-06-04")), 
date("2014-06-04")),
+ ('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), 
date("2014-07-04")),
+ ('t3c', 17, 16, 519, 17, 25, 26.00, timestamp(date("2014-08-04")), 
date("2014-08-04")),
+ ('t3c', 17, 16, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), 
date("2014-09-05")),
+ ('t1b', null, 16, 419, 17, 25, 26.00, timestamp(date("2014-10-04")), 
null),
+ ('t1b', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-11-04")), null),
+ ('t3b', 8, null, 719, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t3b', 8, null, 19, 17, 25, 26.00, timestamp(date("2015-05-04")), 
date("2015-05-04"));
+
+-- correlated IN subquery
+-- GROUP BY in parent side
+-- TC 01.01
+select t1a, avg(t1b) from t1 where t1a in (select t2a from t2) group by 
t1a;
+-- TC 01.02
+select t1a, max(t1b) from t1 where t1b in (select t2b from t2 where t1a = 
t2a) group by t1a, t1d;
+-- TC 01.03
+select t1a, t1b from t1 where t1c in (select t2c from t2 where t1a = t2a) 
group by t1a, t1b;
+-- TC 01.04
+select t1a, sum(distinct(t1b)) from t1 where t1c in (select t2c from t2 
where t1a = t2a) or
+t1c in (select t3c from t3 where t1a =

[GitHub] spark pull request #16394: [SPARK-18981][Core]The job hang problem when spec...

2016-12-23 Thread zhaorongsheng

Github user zhaorongsheng closed the pull request at:

https://github.com/apache/spark/pull/16394


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN s...

2016-12-23 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16337#discussion_r93811696
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-group-by.sql
 ---
@@ -0,0 +1,117 @@
+-- A test suite for GROUP BY in parent side, subquery, and both predicate 
subquery
+-- It includes correlated cases.
+
+-- tables and data types
+
+CREATE DATABASE indb;
+CREATE TABLE t1(t1a String, t1b Short, t1c Int, t1d Long, t1e float, t1f 
double, t1g DECIMAL, t1h TIMESTAMP, t1i Date)
+using parquet;
+CREATE TABLE t2(t2a String, t2b Short, t2c Int, t2d Long, t2e float, t2f 
double, t2g DECIMAL, t2h TIMESTAMP, t2i Date)
+using parquet;
+CREATE TABLE t3(t3a String, t3b Short, t3c Int, t3d Long, t3e float, t3f 
double, t3g DECIMAL, t3h TIMESTAMP, t3i Date)
+using parquet;
+
+-- insert to tables
+INSERT INTO t1 VALUES
+ ('t1a', 6, 8, 10, 15, 20, 20.00, timestamp(date("2014-04-04")), 
date("2014-04-04")),
+ ('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1a', 16, 12, 21, 15, 20, 20.00, timestamp(date("2014-06-04")), 
date("2014-06-04")),
+ ('t1a', 16, 12, 10, 15, 20, 20.00, timestamp(date("2014-07-04")), 
date("2014-07-04")),
+ ('t1c', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-05")),
+ ('t1d', null, 16, 22, 17, 25, 26.00, timestamp(date("2014-06-04")), null),
+ ('t1d', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), null),
+ ('t1e', 10, null, 25, 17, 25, 26.00, timestamp(date("2014-08-04")), 
date("2014-08-04")),
+ ('t1e', 10, null, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), 
date("2014-09-04")),
+ ('t1d', 10, null, 12, 17, 25, 26.00, timestamp(date("2015-05-04")), 
date("2015-05-04")),
+ ('t1a', 6, 8, 10, 15, 20, 20.00, timestamp(date("2014-04-04")), 
date("2014-04-04")),
+ ('t1e', 10, null, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-0=4"));
+
+INSERT INTO t2 VALUES
+ ('t2a', 6, 12, 14, 15, 20, 20.00, timestamp(date("2014-04-04")), 
date("2014-04-04")),
+ ('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 8, 16, 119, 17, 25, 26.00, timestamp(date("2015-05-04")), 
date("2015-05-04")),
+ ('t1c', 12, 16, 219, 17, 25, 26.00, timestamp(date("2016-05-04")), 
date("2016-05-04")),
+ ('t1b', null, 16, 319, 17, 25, 26.00, timestamp(date("2017-05-04")), 
null),
+ ('t2e', 8, null, 419, 17, 25, 26.00, timestamp(date("2014-06-04")), 
date("2014-06-04")),
+ ('t1f', 19, null, 519, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-06-04")), 
date("2014-06-04")),
+ ('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), 
date("2014-07-04")),
+ ('t1c', 12, 16, 19, 17, 25, 26.00, timestamp(date("2014-08-04")), 
date("2014-08-05")),
+ ('t1e', 8, null, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), 
date("2014-09-04")),
+ ('t1f', 19, null, 19, 17, 25, 26.00, timestamp(date("2014-10-04")), 
date("2014-10-04")),
+ ('t1b', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), null);
+
+INSERT INTO t3 VALUES
+ ('t3a', 6, 12, 110, 15, 20, 20.00, timestamp(date("2014-04-04")), 
date("2014-04-04")),
+ ('t3a', 6, 12, 10, 15, 20, 20.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 10, 12, 219, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 8, 16, 319, 17, 25, 26.00, timestamp(date("2014-06-04")), 
date("2014-06-04")),
+ ('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), 
date("2014-07-04")),
+ ('t3c', 17, 16, 519, 17, 25, 26.00, timestamp(date("2014-08-04")), 
date("2014-08-04")),
+ ('t3c', 17, 16, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), 
date("2014-09-05")),
+ ('t1b', null, 16, 419, 17, 25, 26.00, timestamp(date("2014-10-04")), 
null),
+ ('t1b', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-11-04")), null),
+ ('t3b', 8, null, 719, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t3b', 8, null, 19, 17, 25, 26.00, timestamp(date("2015-05-04")), 
date("2015-05-04"));
+
+-- correlated IN subquery
+-- GROUP BY in parent side
+-- TC 01.01
+select t1a, avg(t1b) from t1 where t1a in (select t2a from t2) group by 
t1a;
+-- TC 01.02
+select t1a, max(t1b) from t1 where t1b in (select t2b from t2 where t1a = 
t2a) group by t1a, t1d;
+-- TC 01.03
+select t1a, t1b from t1 where t1c in (select t2c from t2 where t1a = t2a) 
group by t1a, t1b;
+-- TC 01.04
+select t1a, sum(distinct(t1b)) from t1 where t1c in (select t2c from t2 
where t1a = t2a) or
+t1c in (select t3c from t3 where t1a =

[GitHub] spark pull request #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN s...

2016-12-23 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16337#discussion_r93811694
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/simple-in.sql 
---
@@ -0,0 +1,92 @@
+-- A test suite for simple IN predicate subquery
+-- It includes correlated cases.
+
+-- tables and data types
+
+CREATE DATABASE indb;
+CREATE TABLE t1(t1a String, t1b Short, t1c Int, t1d Long, t1e float, t1f 
double, t1g DECIMAL, t1h TIMESTAMP, t1i Date)
+using parquet;
+CREATE TABLE t2(t2a String, t2b Short, t2c Int, t2d Long, t2e float, t2f 
double, t2g DECIMAL, t2h TIMESTAMP, t2i Date)
+using parquet;
+CREATE TABLE t3(t3a String, t3b Short, t3c Int, t3d Long, t3e float, t3f 
double, t3g DECIMAL, t3h TIMESTAMP, t3i Date)
+using parquet;
+
+-- insert to tables
+INSERT INTO t1 VALUES
+ ('t1a', 6, 8, 10, 15, 20, 20.00, timestamp(date("2014-04-04")), 
date("2014-04-04")),
+ ('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1a', 16, 12, 21, 15, 20, 20.00, timestamp(date("2014-06-04")), 
date("2014-06-04")),
+ ('t1a', 16, 12, 10, 15, 20, 20.00, timestamp(date("2014-07-04")), 
date("2014-07-04")),
+ ('t1c', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-05")),
+ ('t1d', null, 16, 22, 17, 25, 26.00, timestamp(date("2014-06-04")), null),
+ ('t1d', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), null),
+ ('t1e', 10, null, 25, 17, 25, 26.00, timestamp(date("2014-08-04")), 
date("2014-08-04")),
+ ('t1e', 10, null, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), 
date("2014-09-04")),
+ ('t1d', 10, null, 12, 17, 25, 26.00, timestamp(date("2015-05-04")), 
date("2015-05-04")),
+ ('t1a', 6, 8, 10, 15, 20, 20.00, timestamp(date("2014-04-04")), 
date("2014-04-04")),
+ ('t1e', 10, null, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-0=4"));
+
+INSERT INTO t2 VALUES
+ ('t2a', 6, 12, 14, 15, 20, 20.00, timestamp(date("2014-04-04")), 
date("2014-04-04")),
+ ('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 8, 16, 119, 17, 25, 26.00, timestamp(date("2015-05-04")), 
date("2015-05-04")),
+ ('t1c', 12, 16, 219, 17, 25, 26.00, timestamp(date("2016-05-04")), 
date("2016-05-04")),
+ ('t1b', null, 16, 319, 17, 25, 26.00, timestamp(date("2017-05-04")), 
null),
+ ('t2e', 8, null, 419, 17, 25, 26.00, timestamp(date("2014-06-04")), 
date("2014-06-04")),
+ ('t1f', 19, null, 519, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-06-04")), 
date("2014-06-04")),
+ ('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), 
date("2014-07-04")),
+ ('t1c', 12, 16, 19, 17, 25, 26.00, timestamp(date("2014-08-04")), 
date("2014-08-05")),
+ ('t1e', 8, null, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), 
date("2014-09-04")),
+ ('t1f', 19, null, 19, 17, 25, 26.00, timestamp(date("2014-10-04")), 
date("2014-10-04")),
+ ('t1b', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), null);
+
+INSERT INTO t3 VALUES
+ ('t3a', 6, 12, 110, 15, 20, 20.00, timestamp(date("2014-04-04")), 
date("2014-04-04")),
+ ('t3a', 6, 12, 10, 15, 20, 20.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 10, 12, 219, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 8, 16, 319, 17, 25, 26.00, timestamp(date("2014-06-04")), 
date("2014-06-04")),
+ ('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), 
date("2014-07-04")),
+ ('t3c', 17, 16, 519, 17, 25, 26.00, timestamp(date("2014-08-04")), 
date("2014-08-04")),
+ ('t3c', 17, 16, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), 
date("2014-09-05")),
+ ('t1b', null, 16, 419, 17, 25, 26.00, timestamp(date("2014-10-04")), 
null),
+ ('t1b', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-11-04")), null),
+ ('t3b', 8, null, 719, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t3b', 8, null, 19, 17, 25, 26.00, timestamp(date("2015-05-04")), 
date("2015-05-04"));
+
+-- correlated IN subquery
+-- simple select
+-- TC 01.01
+select * from t1 where t1a in (select t2a from t2);
+-- TC 01.02
+select * from t1 where t1b in (select t2b from t2 where t1a = t2a);
+-- TC 01.03
+select t1a, t1b from t1 where t1c in (select t2b from t2 where t1a != t2a);
+-- TC 01.04
+select t1a, t1b from t1 where t1c in (select t2b from t2 where t1a = t2a 
or t1b > t2b);
+-- TC 01.05
+select t1a, t1b from t1 where t1c in (select t2b from t2 where t2i in 
(select t3i from t3 where t2c = t3c));
+-- TC 01.06
+select t1a, t1b

[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...

2016-12-23 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r93811689
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -78,11 +108,21 @@ case class CurrentTimestamp() extends LeafExpression 
with CodegenFallback {
  *
  * There is no code generation since this expression should be replaced 
with a literal.
  */
-case class CurrentBatchTimestamp(timestampMs: Long, dataType: DataType)
-  extends LeafExpression with Nondeterministic with CodegenFallback {
+case class CurrentBatchTimestamp(timestampMs: Long, dataType: DataType, 
timeZoneId: String = null)
+  extends LeafExpression with TimeZoneAwareExpression with 
Nondeterministic with CodegenFallback {
+
+  def this(timestampMs: Long, dataType: DataType) = this(timestampMs, 
dataType, null)
 
   override def nullable: Boolean = false
 
+  override def timeZoneResolved: Boolean = dataType != DateType || 
super.timeZoneResolved
--- End diff --

Yes, timestamp always have the same value regardless of timezone.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN s...

2016-12-23 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16337#discussion_r93811681
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/simple-in.sql 
---
@@ -0,0 +1,92 @@
+-- A test suite for simple IN predicate subquery
+-- It includes correlated cases.
+
+-- tables and data types
+
+CREATE DATABASE indb;
+CREATE TABLE t1(t1a String, t1b Short, t1c Int, t1d Long, t1e float, t1f 
double, t1g DECIMAL, t1h TIMESTAMP, t1i Date)
+using parquet;
+CREATE TABLE t2(t2a String, t2b Short, t2c Int, t2d Long, t2e float, t2f 
double, t2g DECIMAL, t2h TIMESTAMP, t2i Date)
+using parquet;
+CREATE TABLE t3(t3a String, t3b Short, t3c Int, t3d Long, t3e float, t3f 
double, t3g DECIMAL, t3h TIMESTAMP, t3i Date)
+using parquet;
+
+-- insert to tables
+INSERT INTO t1 VALUES
+ ('t1a', 6, 8, 10, 15, 20, 20.00, timestamp(date("2014-04-04")), 
date("2014-04-04")),
+ ('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1a', 16, 12, 21, 15, 20, 20.00, timestamp(date("2014-06-04")), 
date("2014-06-04")),
+ ('t1a', 16, 12, 10, 15, 20, 20.00, timestamp(date("2014-07-04")), 
date("2014-07-04")),
+ ('t1c', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-05")),
+ ('t1d', null, 16, 22, 17, 25, 26.00, timestamp(date("2014-06-04")), null),
+ ('t1d', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), null),
+ ('t1e', 10, null, 25, 17, 25, 26.00, timestamp(date("2014-08-04")), 
date("2014-08-04")),
+ ('t1e', 10, null, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), 
date("2014-09-04")),
+ ('t1d', 10, null, 12, 17, 25, 26.00, timestamp(date("2015-05-04")), 
date("2015-05-04")),
+ ('t1a', 6, 8, 10, 15, 20, 20.00, timestamp(date("2014-04-04")), 
date("2014-04-04")),
+ ('t1e', 10, null, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-0=4"));
+
+INSERT INTO t2 VALUES
+ ('t2a', 6, 12, 14, 15, 20, 20.00, timestamp(date("2014-04-04")), 
date("2014-04-04")),
+ ('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 8, 16, 119, 17, 25, 26.00, timestamp(date("2015-05-04")), 
date("2015-05-04")),
+ ('t1c', 12, 16, 219, 17, 25, 26.00, timestamp(date("2016-05-04")), 
date("2016-05-04")),
+ ('t1b', null, 16, 319, 17, 25, 26.00, timestamp(date("2017-05-04")), 
null),
+ ('t2e', 8, null, 419, 17, 25, 26.00, timestamp(date("2014-06-04")), 
date("2014-06-04")),
+ ('t1f', 19, null, 519, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-06-04")), 
date("2014-06-04")),
+ ('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), 
date("2014-07-04")),
+ ('t1c', 12, 16, 19, 17, 25, 26.00, timestamp(date("2014-08-04")), 
date("2014-08-05")),
+ ('t1e', 8, null, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), 
date("2014-09-04")),
+ ('t1f', 19, null, 19, 17, 25, 26.00, timestamp(date("2014-10-04")), 
date("2014-10-04")),
+ ('t1b', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), null);
+
+INSERT INTO t3 VALUES
+ ('t3a', 6, 12, 110, 15, 20, 20.00, timestamp(date("2014-04-04")), 
date("2014-04-04")),
+ ('t3a', 6, 12, 10, 15, 20, 20.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 10, 12, 219, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 8, 16, 319, 17, 25, 26.00, timestamp(date("2014-06-04")), 
date("2014-06-04")),
+ ('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), 
date("2014-07-04")),
+ ('t3c', 17, 16, 519, 17, 25, 26.00, timestamp(date("2014-08-04")), 
date("2014-08-04")),
+ ('t3c', 17, 16, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), 
date("2014-09-05")),
+ ('t1b', null, 16, 419, 17, 25, 26.00, timestamp(date("2014-10-04")), 
null),
+ ('t1b', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-11-04")), null),
+ ('t3b', 8, null, 719, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t3b', 8, null, 19, 17, 25, 26.00, timestamp(date("2015-05-04")), 
date("2015-05-04"));
+
+-- correlated IN subquery
+-- simple select
+-- TC 01.01
+select * from t1 where t1a in (select t2a from t2);
+-- TC 01.02
+select * from t1 where t1b in (select t2b from t2 where t1a = t2a);
+-- TC 01.03
+select t1a, t1b from t1 where t1c in (select t2b from t2 where t1a != t2a);
+-- TC 01.04
+select t1a, t1b from t1 where t1c in (select t2b from t2 where t1a = t2a 
or t1b > t2b);
+-- TC 01.05
+select t1a, t1b from t1 where t1c in (select t2b from t2 where t2i in 
(select t3i from t3 where t2c = t3c));
+-- TC 01.06
+select t1a, t1b

[GitHub] spark pull request #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN s...

2016-12-23 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16337#discussion_r93811675
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/simple-in.sql 
---
@@ -0,0 +1,92 @@
+-- A test suite for simple IN predicate subquery
+-- It includes correlated cases.
+
+-- tables and data types
+
+CREATE DATABASE indb;
+CREATE TABLE t1(t1a String, t1b Short, t1c Int, t1d Long, t1e float, t1f 
double, t1g DECIMAL, t1h TIMESTAMP, t1i Date)
+using parquet;
+CREATE TABLE t2(t2a String, t2b Short, t2c Int, t2d Long, t2e float, t2f 
double, t2g DECIMAL, t2h TIMESTAMP, t2i Date)
+using parquet;
+CREATE TABLE t3(t3a String, t3b Short, t3c Int, t3d Long, t3e float, t3f 
double, t3g DECIMAL, t3h TIMESTAMP, t3i Date)
+using parquet;
+
+-- insert to tables
+INSERT INTO t1 VALUES
+ ('t1a', 6, 8, 10, 15, 20, 20.00, timestamp(date("2014-04-04")), 
date("2014-04-04")),
+ ('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1a', 16, 12, 21, 15, 20, 20.00, timestamp(date("2014-06-04")), 
date("2014-06-04")),
+ ('t1a', 16, 12, 10, 15, 20, 20.00, timestamp(date("2014-07-04")), 
date("2014-07-04")),
+ ('t1c', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-05")),
+ ('t1d', null, 16, 22, 17, 25, 26.00, timestamp(date("2014-06-04")), null),
+ ('t1d', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), null),
+ ('t1e', 10, null, 25, 17, 25, 26.00, timestamp(date("2014-08-04")), 
date("2014-08-04")),
+ ('t1e', 10, null, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), 
date("2014-09-04")),
+ ('t1d', 10, null, 12, 17, 25, 26.00, timestamp(date("2015-05-04")), 
date("2015-05-04")),
+ ('t1a', 6, 8, 10, 15, 20, 20.00, timestamp(date("2014-04-04")), 
date("2014-04-04")),
+ ('t1e', 10, null, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-0=4"));
+
+INSERT INTO t2 VALUES
+ ('t2a', 6, 12, 14, 15, 20, 20.00, timestamp(date("2014-04-04")), 
date("2014-04-04")),
+ ('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 8, 16, 119, 17, 25, 26.00, timestamp(date("2015-05-04")), 
date("2015-05-04")),
+ ('t1c', 12, 16, 219, 17, 25, 26.00, timestamp(date("2016-05-04")), 
date("2016-05-04")),
+ ('t1b', null, 16, 319, 17, 25, 26.00, timestamp(date("2017-05-04")), 
null),
+ ('t2e', 8, null, 419, 17, 25, 26.00, timestamp(date("2014-06-04")), 
date("2014-06-04")),
+ ('t1f', 19, null, 519, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-06-04")), 
date("2014-06-04")),
+ ('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), 
date("2014-07-04")),
+ ('t1c', 12, 16, 19, 17, 25, 26.00, timestamp(date("2014-08-04")), 
date("2014-08-05")),
+ ('t1e', 8, null, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), 
date("2014-09-04")),
+ ('t1f', 19, null, 19, 17, 25, 26.00, timestamp(date("2014-10-04")), 
date("2014-10-04")),
+ ('t1b', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), null);
+
+INSERT INTO t3 VALUES
+ ('t3a', 6, 12, 110, 15, 20, 20.00, timestamp(date("2014-04-04")), 
date("2014-04-04")),
+ ('t3a', 6, 12, 10, 15, 20, 20.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 10, 12, 219, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 8, 16, 319, 17, 25, 26.00, timestamp(date("2014-06-04")), 
date("2014-06-04")),
+ ('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), 
date("2014-07-04")),
+ ('t3c', 17, 16, 519, 17, 25, 26.00, timestamp(date("2014-08-04")), 
date("2014-08-04")),
+ ('t3c', 17, 16, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), 
date("2014-09-05")),
+ ('t1b', null, 16, 419, 17, 25, 26.00, timestamp(date("2014-10-04")), 
null),
+ ('t1b', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-11-04")), null),
+ ('t3b', 8, null, 719, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t3b', 8, null, 19, 17, 25, 26.00, timestamp(date("2015-05-04")), 
date("2015-05-04"));
+
+-- correlated IN subquery
+-- simple select
+-- TC 01.01
+select * from t1 where t1a in (select t2a from t2);
+-- TC 01.02
+select * from t1 where t1b in (select t2b from t2 where t1a = t2a);
+-- TC 01.03
+select t1a, t1b from t1 where t1c in (select t2b from t2 where t1a != t2a);
+-- TC 01.04
+select t1a, t1b from t1 where t1c in (select t2b from t2 where t1a = t2a 
or t1b > t2b);
+-- TC 01.05
+select t1a, t1b from t1 where t1c in (select t2b from t2 where t2i in 
(select t3i from t3 where t2c = t3c));
+-- TC 01.06
+select t1a, t1b

[GitHub] spark issue #16394: [SPARK-18981][Core]The job hang problem when speculation...

2016-12-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16394
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16394: [SPARK-18981][Core]The job hang problem when spec...

2016-12-23 Thread zhaorongsheng

GitHub user zhaorongsheng opened a pull request:

https://github.com/apache/spark/pull/16394

[SPARK-18981][Core]The job hang problem when speculation is on

## What changes were proposed in this pull request?

The root cause of this issue is that `ExecutorAllocationListener` gets the 
speculated task end info after the stage end event handling which let 
`numRunningTasks = 0`. Then it let `numRunningTasks -= 1` so the 
#numRunningTasks is negative. When calculate #maxNeeded in method 
`maxNumExecutorsNeeded()`, the value may be 0 or negative. So 
`ExecutorAllocationManager` does not request container and the job will be hung.

This PR changes the method `onTaskEnd()` in `ExecutorAllocationListener`. 
When `stageIdToNumTasks` contains the taskEnd's stageId, let #numRunningTasks 
minus 1.

## How was this patch tested?

This patch was tested in the method `test("SPARK-18981...)` of 
ExecutorAllocationManagerSuite.scala.
Create two taskInfos and one of them is speculated task. After the stage 
ending event, the speculated task ending event is posted to listener.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhaorongsheng/spark branch-18981-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16394.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16394


commit c0220aefb1689731144dafccb001860276ee8d22
Author: roncen.zhao 
Date:   2016-12-24T02:37:53Z

resolve the job hang problem when speculation is on




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN s...

2016-12-23 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16337#discussion_r93811584
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-group-by.sql
 ---
@@ -0,0 +1,117 @@
+-- A test suite for GROUP BY in parent side, subquery, and both predicate 
subquery
+-- It includes correlated cases.
+
+-- tables and data types
+
+CREATE DATABASE indb;
+CREATE TABLE t1(t1a String, t1b Short, t1c Int, t1d Long, t1e float, t1f 
double, t1g DECIMAL, t1h TIMESTAMP, t1i Date)
+using parquet;
+CREATE TABLE t2(t2a String, t2b Short, t2c Int, t2d Long, t2e float, t2f 
double, t2g DECIMAL, t2h TIMESTAMP, t2i Date)
+using parquet;
+CREATE TABLE t3(t3a String, t3b Short, t3c Int, t3d Long, t3e float, t3f 
double, t3g DECIMAL, t3h TIMESTAMP, t3i Date)
+using parquet;
+
+-- insert to tables
+INSERT INTO t1 VALUES
+ ('t1a', 6, 8, 10, 15, 20, 20.00, timestamp(date("2014-04-04")), 
date("2014-04-04")),
+ ('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1a', 16, 12, 21, 15, 20, 20.00, timestamp(date("2014-06-04")), 
date("2014-06-04")),
+ ('t1a', 16, 12, 10, 15, 20, 20.00, timestamp(date("2014-07-04")), 
date("2014-07-04")),
+ ('t1c', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-05")),
+ ('t1d', null, 16, 22, 17, 25, 26.00, timestamp(date("2014-06-04")), null),
+ ('t1d', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), null),
+ ('t1e', 10, null, 25, 17, 25, 26.00, timestamp(date("2014-08-04")), 
date("2014-08-04")),
+ ('t1e', 10, null, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), 
date("2014-09-04")),
+ ('t1d', 10, null, 12, 17, 25, 26.00, timestamp(date("2015-05-04")), 
date("2015-05-04")),
+ ('t1a', 6, 8, 10, 15, 20, 20.00, timestamp(date("2014-04-04")), 
date("2014-04-04")),
+ ('t1e', 10, null, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-0=4"));
+
+INSERT INTO t2 VALUES
+ ('t2a', 6, 12, 14, 15, 20, 20.00, timestamp(date("2014-04-04")), 
date("2014-04-04")),
+ ('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 8, 16, 119, 17, 25, 26.00, timestamp(date("2015-05-04")), 
date("2015-05-04")),
+ ('t1c', 12, 16, 219, 17, 25, 26.00, timestamp(date("2016-05-04")), 
date("2016-05-04")),
+ ('t1b', null, 16, 319, 17, 25, 26.00, timestamp(date("2017-05-04")), 
null),
+ ('t2e', 8, null, 419, 17, 25, 26.00, timestamp(date("2014-06-04")), 
date("2014-06-04")),
+ ('t1f', 19, null, 519, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-06-04")), 
date("2014-06-04")),
+ ('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), 
date("2014-07-04")),
+ ('t1c', 12, 16, 19, 17, 25, 26.00, timestamp(date("2014-08-04")), 
date("2014-08-05")),
+ ('t1e', 8, null, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), 
date("2014-09-04")),
+ ('t1f', 19, null, 19, 17, 25, 26.00, timestamp(date("2014-10-04")), 
date("2014-10-04")),
+ ('t1b', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), null);
+
+INSERT INTO t3 VALUES
+ ('t3a', 6, 12, 110, 15, 20, 20.00, timestamp(date("2014-04-04")), 
date("2014-04-04")),
+ ('t3a', 6, 12, 10, 15, 20, 20.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 10, 12, 219, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 10, 12, 19, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t1b', 8, 16, 319, 17, 25, 26.00, timestamp(date("2014-06-04")), 
date("2014-06-04")),
+ ('t1b', 8, 16, 19, 17, 25, 26.00, timestamp(date("2014-07-04")), 
date("2014-07-04")),
+ ('t3c', 17, 16, 519, 17, 25, 26.00, timestamp(date("2014-08-04")), 
date("2014-08-04")),
+ ('t3c', 17, 16, 19, 17, 25, 26.00, timestamp(date("2014-09-04")), 
date("2014-09-05")),
+ ('t1b', null, 16, 419, 17, 25, 26.00, timestamp(date("2014-10-04")), 
null),
+ ('t1b', null, 16, 19, 17, 25, 26.00, timestamp(date("2014-11-04")), null),
+ ('t3b', 8, null, 719, 17, 25, 26.00, timestamp(date("2014-05-04")), 
date("2014-05-04")),
+ ('t3b', 8, null, 19, 17, 25, 26.00, timestamp(date("2015-05-04")), 
date("2015-05-04"));
+
+-- correlated IN subquery
+-- GROUP BY in parent side
+-- TC 01.01
+select t1a, avg(t1b) from t1 where t1a in (select t2a from t2) group by 
t1a;
+-- TC 01.02
+select t1a, max(t1b) from t1 where t1b in (select t2b from t2 where t1a = 
t2a) group by t1a, t1d;
+-- TC 01.03
+select t1a, t1b from t1 where t1c in (select t2c from t2 where t1a = t2a) 
group by t1a, t1b;
+-- TC 01.04
+select t1a, sum(distinct(t1b)) from t1 where t1c in (select t2c from t2 
where t1a = t2a) or
+t1c in (select t3c from t3 where t1a =

[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...

2016-12-23 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r93811548
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -135,6 +167,15 @@ case class Cast(child: Expression, dataType: DataType) 
extends UnaryExpression w
 
   override def nullable: Boolean = Cast.forceNullable(child.dataType, 
dataType) || child.nullable
 
+  override def timeZoneResolved: Boolean =
--- End diff --

Yes, that's right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...

2016-12-23 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16308#discussion_r93811540
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -104,6 +104,7 @@ class Analyzer(
   ResolveAggregateFunctions ::
   TimeWindowing ::
   ResolveInlineTables ::
+  ResolveTimeZone ::
--- End diff --

@hvanhovell Thank you for your suggestion.
I overrode `resolved` of `TimeZoneAwareExpression`s, then we needed to add 
`ResolveTimeZone` to `Resolution` batch, but I found we don't need to worry 
about the resolution because to have the timezone or not doesn't affect the 
resolution and we only need to run once now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16323: [SPARK-18911] [SQL] Define CatalogStatistics to interact...

2016-12-23 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16323
  
LGTM, pending jenkins


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16323: [SPARK-18911] [SQL] Define CatalogStatistics to interact...

2016-12-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16323
  
**[Test build #70561 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70561/testReport)**
 for PR 16323 at commit 
[`978bb11`](https://github.com/apache/spark/commit/978bb11d2bdbbf099473525b2baf714154640890).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16323: [SPARK-18911] [SQL] Define CatalogStatistics to i...

2016-12-23 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16323#discussion_r93811358
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
@@ -260,4 +261,46 @@ abstract class StatisticsCollectionTestBase extends 
QueryTest with SQLTestUtils
   }
 }
   }
+
+  // This test will be run twice: with and without Hive support
+  test("conversion from CatalogStatistics to Statistics") {
+withTable("ds_tbl", "hive_tbl") {
+  // Test data source table
+  checkStatsConversion(tableName = "ds_tbl", isDatasourceTable = true)
+  // Test hive serde table
+  if (spark.conf.get(StaticSQLConf.CATALOG_IMPLEMENTATION) == "hive") {
+checkStatsConversion(tableName = "hive_tbl", isDatasourceTable = 
false)
+  }
+}
+  }
+
+  private def checkStatsConversion(tableName: String, isDatasourceTable: 
Boolean): Unit = {
+// Create an empty table and run analyze command on it.
+val col = "c1"
+val createTableSql = if (isDatasourceTable) {
+  s"CREATE TABLE $tableName ($col INT) USING PARQUET"
--- End diff --

ok, fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16323: [SPARK-18911] [SQL] Define CatalogStatistics to i...

2016-12-23 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16323#discussion_r93811353
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
@@ -260,4 +261,46 @@ abstract class StatisticsCollectionTestBase extends 
QueryTest with SQLTestUtils
   }
 }
   }
+
+  // This test will be run twice: with and without Hive support
+  test("conversion from CatalogStatistics to Statistics") {
+withTable("ds_tbl", "hive_tbl") {
+  // Test data source table
+  checkStatsConversion(tableName = "ds_tbl", isDatasourceTable = true)
+  // Test hive serde table
+  if (spark.conf.get(StaticSQLConf.CATALOG_IMPLEMENTATION) == "hive") {
+checkStatsConversion(tableName = "hive_tbl", isDatasourceTable = 
false)
+  }
+}
+  }
+
+  private def checkStatsConversion(tableName: String, isDatasourceTable: 
Boolean): Unit = {
+// Create an empty table and run analyze command on it.
+val col = "c1"
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16392: [SPARK-18992] [SQL] Move spark.sql.hive.thriftServer.sin...

2016-12-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16392
  
**[Test build #70560 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70560/testReport)**
 for PR 16392 at commit 
[`5221494`](https://github.com/apache/spark/commit/52214945fdc7705f65ce6522c2d05b1a79e69c78).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16389: [SPARK-18981][Core]The job hang problem when spec...

2016-12-23 Thread zhaorongsheng

Github user zhaorongsheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/16389#discussion_r93811187
  
--- Diff: 
core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala ---
@@ -938,6 +938,33 @@ class ExecutorAllocationManagerSuite
 assert(removeTimes(manager) === Map.empty)
   }
 
+  test("SPARK-18981: maxNumExecutorsNeeded should properly handle 
speculated tasks") {
+sc = createSparkContext()
+val manager = sc.executorAllocationManager.get
+assert(maxNumExecutorsNeeded(manager) === 0)
+
+val stageInfo = createStageInfo(0, 1)
+sc.listenerBus.postToAll(SparkListenerStageSubmitted(stageInfo))
+assert(maxNumExecutorsNeeded(manager) === 1)
+
+val taskInfo = createTaskInfo(1, 1, "executor-1")
+val speculatedTaskInfo = createTaskInfo(2, 1, "executor-1")
+sc.listenerBus.postToAll(SparkListenerTaskStart(0, 0, taskInfo))
+assert(maxNumExecutorsNeeded(manager) === 1)
--- End diff --

Yes, the warning info 'No stages are running, but numRunningTasks != 0' is 
printed and at that time the #numRunningTasks is set to 0. But after that the 
speculated task end event is arrived and the  #numRunningTasks will plus 1.
The tests are wrong, I will fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16393: [SPARK-18993] [Build] Revert Split test-tags into...

2016-12-23 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16393#discussion_r93810309
  
--- Diff: common/tags/pom.xml ---
@@ -34,6 +34,14 @@
 tags
   
 
+  
+
+  org.scalatest
+  scalatest_${scala.binary.version}
+  compile
+
+  
--- End diff --

So far, I used both mvn and sbt to do a full rebuild in the command line. I 
also click `Invalidate Caches/Restart` in IntelliJ. I tried shutdown zinc. 
Still got the same error. 

To bypass/resolve the compilation failure, I have to add the above 
dependency back in my local environement. After this, everything works well in 
IntelliJ.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16393: [SPARK-18993] [Build] Revert Split test-tags into test-J...

2016-12-23 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16393
  
In the command line, everything works well. However, in IntelliJ, it does 
not work well. 

@srowen Thanks for a quick reply. Just wondering if you also hit this issue 
when you building Spark in IntelliJ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16393: [SPARK-18993] [Build] Revert Split test-tags into test-J...

2016-12-23 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16393
  
Do a full clean build, including clearing caches, and restarting zinc. This 
change shouldn't result in that error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16393: [SPARK-18993] [Build] Revert Split test-tags into test-J...

2016-12-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16393
  
**[Test build #70559 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70559/testReport)**
 for PR 16393 at commit 
[`685f28b`](https://github.com/apache/spark/commit/685f28bb7c42148144fe9b8a57c9b29af8dc0e90).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16393: [SPARK-18993] [Build] Revert Split test-tags into...

2016-12-23 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/16393

[SPARK-18993] [Build] Revert Split test-tags into test-JAR 

### What changes were proposed in this pull request?
After https://github.com/apache/spark/pull/16311 is merged, I am unable to 
build it in my IntelliJ. Got the following error:

```
Error:scalac: error while loading Object, Missing dependency 'object scala 
in compiler mirror', required by 
/Library/Java/JavaVirtualMachines/jdk1.8.0_74.jdk/Contents/Home/jre/lib/rt.jar(java/lang/Object.class)
Error:scalac: Error: object scala in compiler mirror not found.
scala.reflect.internal.MissingRequirementError: object scala in compiler 
mirror not found.
at 
scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:17)
at 
scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:18)
at 
scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:53)
at 
scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:66)
at 
scala.reflect.internal.Mirrors$RootsBase.getPackage(Mirrors.scala:173)
at 
scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackage$lzycompute(Definitions.scala:161)
at 
scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackage(Definitions.scala:161)
at 
scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackageClass$lzycompute(Definitions.scala:162)
at 
scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackageClass(Definitions.scala:162)
at 
scala.reflect.internal.Definitions$DefinitionsClass.init(Definitions.scala:1395)
at scala.tools.nsc.Global$Run.(Global.scala:1215)
at xsbt.CachedCompiler0$$anon$2.(CompilerInterface.scala:105)
at xsbt.CachedCompiler0.run(CompilerInterface.scala:105)
at xsbt.CachedCompiler0.run(CompilerInterface.scala:94)
at xsbt.CompilerInterface.run(CompilerInterface.scala:22)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:101)
at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:47)
at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:41)
at 
org.jetbrains.jps.incremental.scala.local.IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:29)
at 
org.jetbrains.jps.incremental.scala.local.LocalServer.compile(LocalServer.scala:26)
at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main.scala:67)
at 
org.jetbrains.jps.incremental.scala.remote.Main$.nailMain(Main.scala:24)
at org.jetbrains.jps.incremental.scala.remote.Main.nailMain(Main.scala)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.martiansoftware.nailgun.NGSession.run(NGSession.java:319)
```

### How was this patch tested?
Manually

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark IntelliJError

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16393.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16393


commit 685f28bb7c42148144fe9b8a57c9b29af8dc0e90
Author: gatorsmile 
Date:   2016-12-24T00:08:33Z

Revert: [SPARK-17807][CORE] split test-tags into test-JAR




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16390: [SPARK-18991][Core]Change ContextCleaner.referenc...

2016-12-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16390


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16390: [SPARK-18991][Core]Change ContextCleaner.referenceBuffer...

2016-12-23 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/16390
  
Thanks! Merging to master and 2.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16240: [SPARK-16792][SQL] Dataset containing a Case Clas...

2016-12-23 Thread michalsenkyr

Github user michalsenkyr commented on a diff in the pull request:

https://github.com/apache/spark/pull/16240#discussion_r93807181
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -312,12 +312,46 @@ object ScalaReflection extends ScalaReflection {
   "array",
   ObjectType(classOf[Array[Any]]))
 
-StaticInvoke(
+val wrappedArray = StaticInvoke(
   scala.collection.mutable.WrappedArray.getClass,
   ObjectType(classOf[Seq[_]]),
   "make",
   array :: Nil)
 
+if (localTypeOf[scala.collection.mutable.WrappedArray[_]] <:< 
t.erasure) {
+  wrappedArray
+} else {
+  // Convert to another type using `to`
+  val cls = mirror.runtimeClass(t.typeSymbol.asClass)
+  import scala.collection.generic.CanBuildFrom
+  import scala.reflect.ClassTag
+  import scala.util.{Try, Success}
--- End diff --

Done. I tried looking up the code style you mentioned, but only found the 
[Databricks' Scala Code Style 
Guide](https://github.com/databricks/scala-style-guide#exception-handling-try-vs-try).
 And that is not mentioned in the Spark docs as far as I know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL progra...

2016-12-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16329
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70558/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL progra...

2016-12-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16329
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL progra...

2016-12-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16329
  
**[Test build #70558 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70558/testReport)**
 for PR 16329 at commit 
[`0c55b94`](https://github.com/apache/spark/commit/0c55b944217ce475e38991c460c2d956a88b8b9e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16374: [SPARK-18925][STREAMING] Reduce memory usage of mapWithS...

2016-12-23 Thread vpchelko

Github user vpchelko commented on the issue:

https://github.com/apache/spark/pull/16374
  
cc @tdas


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16345: [SPARK-17755][Core]Use workerRef to send RegisterWorkerR...

2016-12-23 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/16345
  
Ah, I see there's already other code that deals with retries and timeouts. 
LGTM then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL progra...

2016-12-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16329
  
**[Test build #70558 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70558/testReport)**
 for PR 16329 at commit 
[`0c55b94`](https://github.com/apache/spark/commit/0c55b944217ce475e38991c460c2d956a88b8b9e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16240: [SPARK-16792][SQL] Dataset containing a Case Clas...

2016-12-23 Thread michalsenkyr

Github user michalsenkyr commented on a diff in the pull request:

https://github.com/apache/spark/pull/16240#discussion_r93805236
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala 
---
@@ -100,31 +97,36 @@ abstract class SQLImplicits {
   // Seqs
 
   /** @since 1.6.1 */
-  implicit def newIntSeqEncoder: Encoder[Seq[Int]] = ExpressionEncoder()
+  implicit def newIntSeqEncoder[T <: Seq[Int] : TypeTag]: Encoder[T] = 
ExpressionEncoder()
 
   /** @since 1.6.1 */
-  implicit def newLongSeqEncoder: Encoder[Seq[Long]] = ExpressionEncoder()
+  implicit def newLongSeqEncoder[T <: Seq[Long] : TypeTag]: Encoder[T] = 
ExpressionEncoder()
 
   /** @since 1.6.1 */
-  implicit def newDoubleSeqEncoder: Encoder[Seq[Double]] = 
ExpressionEncoder()
+  implicit def newDoubleSeqEncoder[T <: Seq[Double] : TypeTag]: Encoder[T] 
= ExpressionEncoder()
 
   /** @since 1.6.1 */
-  implicit def newFloatSeqEncoder: Encoder[Seq[Float]] = 
ExpressionEncoder()
+  implicit def newFloatSeqEncoder[T <: Seq[Float] : TypeTag]: Encoder[T] = 
ExpressionEncoder()
 
   /** @since 1.6.1 */
-  implicit def newByteSeqEncoder: Encoder[Seq[Byte]] = ExpressionEncoder()
+  implicit def newByteSeqEncoder[T <: Seq[Byte] : TypeTag]: Encoder[T] = 
ExpressionEncoder()
 
   /** @since 1.6.1 */
-  implicit def newShortSeqEncoder: Encoder[Seq[Short]] = 
ExpressionEncoder()
+  implicit def newShortSeqEncoder[T <: Seq[Short] : TypeTag]: Encoder[T] = 
ExpressionEncoder()
 
   /** @since 1.6.1 */
-  implicit def newBooleanSeqEncoder: Encoder[Seq[Boolean]] = 
ExpressionEncoder()
+  implicit def newBooleanSeqEncoder[T <: Seq[Boolean] : TypeTag]: 
Encoder[T] = ExpressionEncoder()
 
   /** @since 1.6.1 */
-  implicit def newStringSeqEncoder: Encoder[Seq[String]] = 
ExpressionEncoder()
+  implicit def newStringSeqEncoder[T <: Seq[String] : TypeTag]: Encoder[T] 
= ExpressionEncoder()
 
   /** @since 1.6.1 */
-  implicit def newProductSeqEncoder[A <: Product : TypeTag]: 
Encoder[Seq[A]] = ExpressionEncoder()
+  implicit def newProductSeqEncoder[A <: Product : TypeTag, T <: Seq[A] : 
TypeTag]: Encoder[T] =
--- End diff --

This one is the same as all the other ones, just with Product subclasses. 
If you were concerned about the `TypeTag` on `A`, it was actually not needed as 
`T`'s tag already contains all the information. I just tested it to be sure and 
removed it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16392: [SPARK-18992] [SQL] Move spark.sql.hive.thriftServer.sin...

2016-12-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16392
  
**[Test build #70557 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70557/testReport)**
 for PR 16392 at commit 
[`cc459fd`](https://github.com/apache/spark/commit/cc459fd00b0bf05c4cc8ce54739694a1540f0d91).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 263 matches

Mail list logo