from:"adrian\-wang"

[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...

2018-12-03 Thread adrian-wang

Github user adrian-wang commented on the issue:

https://github.com/apache/spark/pull/23214
  
maybe add some detailed test result in description and explain the reason 
for this in code comment?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...

2018-12-02 Thread adrian-wang

Github user adrian-wang commented on the issue:

https://github.com/apache/spark/pull/23152
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...

2018-12-02 Thread adrian-wang

Github user adrian-wang commented on the issue:

https://github.com/apache/spark/pull/23152
  
@juliuszsompolski I have updated it accordingly, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of...

2018-11-30 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/23152#discussion_r237808846
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -2276,4 +2276,16 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
 }
   }
 
+
+  test("SPARK-26181 hasMinMaxStats method of ColumnStatsMap is not 
correct") {
+withSQLConf(SQLConf.CBO_ENABLED.key -> "true") {
+  withTable("all_null") {
+sql("create table all_null (attrInt int)")
+sql("insert into all_null values (null)")
+sql("analyze table all_null compute statistics for columns 
attrInt")
+checkAnswer(sql("select * from all_null where attrInt < 1"), Nil)
--- End diff --

I checked and it failed without this patch...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of...

2018-11-30 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/23152#discussion_r237808676
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -2276,4 +2276,16 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
 }
   }
 
+
+  test("SPARK-26181 hasMinMaxStats method of ColumnStatsMap is not 
correct") {
+withSQLConf(SQLConf.CBO_ENABLED.key -> "true") {
+  withTable("all_null") {
+sql("create table all_null (attrInt int)")
+sql("insert into all_null values (null)")
+sql("analyze table all_null compute statistics for columns 
attrInt")
+checkAnswer(sql("select * from all_null where attrInt < 1"), Nil)
--- End diff --

actually, when I remove the code change from source code and rerun this 
test, there's an exception being thrown.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of...

2018-11-29 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/23152#discussion_r237735460
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
@@ -879,13 +879,13 @@ case class ColumnStatsMap(originalMap: 
AttributeMap[ColumnStat]) {
   }
 
   def hasCountStats(a: Attribute): Boolean =
-get(a).map(_.hasCountStats).getOrElse(false)
+get(a).exists(_.hasCountStats)
 
   def hasDistinctCount(a: Attribute): Boolean =
-get(a).map(_.distinctCount.isDefined).getOrElse(false)
+get(a).exists(_.distinctCount.isDefined)
 
   def hasMinMaxStats(a: Attribute): Boolean =
-get(a).map(_.hasCountStats).getOrElse(false)
+get(a).exists(_.hasMinMaxStats)
--- End diff --

sure, thank you all


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of...

2018-11-28 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/23152#discussion_r237380128
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala
 ---
@@ -821,6 +822,32 @@ class FilterEstimationSuite extends 
StatsEstimationTestBase {
   expectedRowCount = 3)
   }
 
+  test("ColumnStatsMap tests") {
--- End diff --

There's no public call to `evaluateBinary` method actually.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of...

2018-11-28 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/23152#discussion_r237359357
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala
 ---
@@ -821,6 +822,32 @@ class FilterEstimationSuite extends 
StatsEstimationTestBase {
   expectedRowCount = 3)
   }
 
+  test("ColumnStatsMap tests") {
--- End diff --

Hi @viirya , thanks for your review. Here the `FilterEstimation` works 
because we only call `FilterEstimation.estimate` to check it, in which we check 
if `rowCount.isEmpty` first and shortcut the estimation. If `rowCount` is not 
empty, we should always has a `min` and `max` in the stats, unless it is 
malformed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...

2018-11-28 Thread adrian-wang

Github user adrian-wang commented on the issue:

https://github.com/apache/spark/pull/23152
  
@srowen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...

2018-11-27 Thread adrian-wang

Github user adrian-wang commented on the issue:

https://github.com/apache/spark/pull/23152
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of...

2018-11-26 Thread adrian-wang

GitHub user adrian-wang opened a pull request:

https://github.com/apache/spark/pull/23152

[SPARK-26181][SQL] the `hasMinMaxStats` method of `ColumnStatsMap` is not 
correct

## What changes were proposed in this pull request?

For now the `hasMinMaxStats` will return the same as `hasCountStats`, which 
is obviously not as expected.

## How was this patch tested?

Existing tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian-wang/spark minmaxstats

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23152.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23152


commit 1c1edccc43ac9cdb6da53ad93ac998b06bf8b6dd
Author: Daoyuan Wang 
Date:   2018-11-27T06:18:34Z

fix ColumnStatsMap.hasMinMaxStats




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21282: [SPARK-23934][SQL] Adding map_from_entries functi...

2018-05-10 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21282#discussion_r187291692
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -118,6 +120,229 @@ case class MapValues(child: Expression)
   override def prettyName: String = "map_values"
 }
 
+/**
+ * Returns a map created from the given array of entries.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(arrayOfEntries) - Returns a map created from the given 
array of entries.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));
+   {1:"a",2:"b"}
+  """,
+  since = "2.4.0")
+case class MapFromEntries(child: Expression) extends UnaryExpression
+{
--- End diff --

Nit: style


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21227: Backport [SPARK-24133][SQL] Check for integer ove...

2018-05-03 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21227#discussion_r185801594
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
 ---
@@ -81,7 +81,9 @@ public void close() {
   }
 
   public void reserve(int requiredCapacity) {
-if (requiredCapacity > capacity) {
+if (requiredCapacity < 0) {
+  throwUnsupportedException(requiredCapacity, null);
--- End diff --

thanks for the explanation!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21227: [SPARK-24133][SQL] Check for integer overflows wh...

2018-05-03 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21227#discussion_r185768453
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
 ---
@@ -81,7 +81,9 @@ public void close() {
   }
 
   public void reserve(int requiredCapacity) {
-if (requiredCapacity > capacity) {
+if (requiredCapacity < 0) {
+  throwUnsupportedException(requiredCapacity, null);
--- End diff --

how about use `ArithmeticException` instead of null?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-05 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r94798265
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+  * A function that returns the index of str in (str1, str2, ...) list or 
0 if not found.
+  * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+  */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
+case Nil => 0
+case head::tail if targetDataType == head.dataType
+  && head.eval(input) != null && ordering.equiv(target, 
head.eval(input)) => index
+case _ => findEqual(target, params.tail, index + 1)
+  }
+}
+if(target == null)
+  0
+else
+ findEqual(target, children.tail, 1)
+  }
+
+  protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val evalChildren = children.map(_.genCode(ctx))
+val target = evalChildren(0)
+val targetDataType = children(0).dataType
+val rest = evalChildren.drop(1)
+val restDataType = children.drop(1).map(_.dataType)
+
+def updateEval(evalWithIndex: Tuple2[Tuple2[ExprCode, DataType], 
Int]): String = {
+  val ((eval, dataType), index) = evalWithIndex
+  s"""
+${eval.code}
+if (${dataType.equals(targetDataType)}
+  && ${ctx.genEqual(targetDataType, eval.value, target.value)}) {
+  ${ev.value} = ${index};
+}
+  """
+}
+
+def genIfElseStructure(code1: String, code2: String): String = {
+  s"""
+ ${code1}
+ else {
+  ${code2}
+ }
+   """
+}
+
+def dataTypeEqualsTarget(evalWithIndex: Tuple2[Tuple2[ExprCode, 
DataType], Int]): Boolean = {
--- End diff --

`def dataTypeEqualsTarget(evalWithIndex: ((ExprCode, DataType), Int)): 
Boolean`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-05 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r94797482
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+  * A function that returns the index of str in (str1, str2, ...) list or 
0 if not found.
+  * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+  */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
+case Nil => 0
+case head::tail if targetDataType == head.dataType
+  && head.eval(input) != null && ordering.equiv(target, 
head.eval(input)) => index
+case _ => findEqual(target, params.tail, index + 1)
+  }
+}
+if(target == null)
+  0
+else
+ findEqual(target, children.tail, 1)
--- End diff --

Nit: one more space before `findEqual`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-05 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r94796481
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+  * A function that returns the index of str in (str1, str2, ...) list or 
0 if not found.
--- End diff --

Nit: delete a space before *


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #12391: [SPARK-14631][SQL]drop database cascade should dr...

2016-12-12 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/12391#discussion_r92030896
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala
 ---
@@ -778,6 +778,24 @@ abstract class ExternalCatalogSuite extends 
SparkFunSuite with BeforeAndAfterEac
   "db2", "tbl1", Seq(part1.spec), ignoreIfNotExists = false, purge = 
false, retainData = false)
 assert(fs.exists(partPath))
   }
+
+  test("drop database cascade with function defined") {
--- End diff --

@gatorsmile Looks like it is not a problem now... I'll close this, thank 
you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #12391: [SPARK-14631][SQL]drop database cascade should dr...

2016-12-12 Thread adrian-wang

Github user adrian-wang closed the pull request at:

https://github.com/apache/spark/pull/12391


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12391: [SPARK-14631][SQL]drop database cascade should drop func...

2016-12-12 Thread adrian-wang

Github user adrian-wang commented on the issue:

https://github.com/apache/spark/pull/12391
  
I'll update this ASAP


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15011: [SPARK-17122][SQL]support drop current database

2016-11-02 Thread adrian-wang

Github user adrian-wang commented on the issue:

https://github.com/apache/spark/pull/15011
  
Thanks for your kind review @cloud-fan @gatorsmile , this helps our 
enabling for BigBench on Spark 2.x


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15011: [SPARK-17122][SQL]support drop current database

2016-09-12 Thread adrian-wang

Github user adrian-wang commented on the issue:

https://github.com/apache/spark/pull/15011
  
@hvanhovell I have checked with Hive and MySQL, they all support dropping 
current database. By asking user to switch to another database before drop the 
current one is not enough though, if there are multiple users connected to the 
same metadata, even you are not using the certain database, some one else may 
be using that. What's more, if you want to drop a database but without the 
privilege to access databases created by other user, you will always leave one 
empty database behind.
In Spark's implementation, we ensure the database exists before we do 
anything, so drop current database is OK. This is also the way other systems 
adopt.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15011: [SPARK-17122][SQL]support drop current database

2016-09-08 Thread adrian-wang

Github user adrian-wang commented on the issue:

https://github.com/apache/spark/pull/15011
  
@jameszhouyi @chenghao-intel 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15011: [SPARK-17122][SQL]support drop current database

2016-09-08 Thread adrian-wang

GitHub user adrian-wang opened a pull request:

https://github.com/apache/spark/pull/15011

[SPARK-17122][SQL]support drop current database

## What changes were proposed in this pull request?

In Spark 1.6 and earlier, we can drop the database we are using. In Spark 
2.0, native implementation prevent us from dropping current database, which may 
break some old queries. This PR would re-enable the feature and set current 
database to default instead.


## How was this patch tested?

one new unit test in `SessionCatalogSuite`.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian-wang/spark dropcurrent

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15011.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15011


commit 551b27252c0f5fb0de6d0958407cebac0b19ba64
Author: Daoyuan Wang <daoyuan.w...@intel.com>
Date:   2016-09-08T06:56:32Z

support drop current database




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14991: [SPARK-17427][SQL] function SIZE should return -1...

2016-09-06 Thread adrian-wang

GitHub user adrian-wang opened a pull request:

https://github.com/apache/spark/pull/14991

[SPARK-17427][SQL] function SIZE should return -1 when parameter is null

## What changes were proposed in this pull request?

`select size(null)` returns -1 in Hive. In order to be compatible, we 
should return `-1`.


## How was this patch tested?

unit test in `CollectionFunctionsSuite` and `DataFrameFunctionsSuite`.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian-wang/spark size

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14991.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14991


commit 1ccbe6bd41b1e60ea62a157771d4b3ca37f8678f
Author: Daoyuan Wang <daoyuan.w...@intel.com>
Date:   2016-09-07T04:52:58Z

size(null)=-1




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14366: [SPARK-16732][SQL] Remove unused codes in subexpressionE...

2016-07-26 Thread adrian-wang

Github user adrian-wang commented on the issue:

https://github.com/apache/spark/pull/14366
  
good catch!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14280: [SPARK-16515][SQL][FOLLOW-UP] Fix test `script` o...

2016-07-20 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/14280#discussion_r71477889
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -64,14 +67,19 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils 
with TestHiveSingleton {
   import spark.implicits._
 
   test("script") {
-val df = Seq(("x1", "y1", "z1"), ("x2", "y2", "z2")).toDF("c1", "c2", 
"c3")
-df.createOrReplaceTempView("script_table")
-val query1 = sql(
-  """
-|SELECT col1 FROM (from(SELECT c1, c2, c3 FROM script_table) 
tempt_table
-|REDUCE c1, c2, c3 USING 'bash src/test/resources/test_script.sh' 
AS
-|(col1 STRING, col2 STRING)) script_test_table""".stripMargin)
-checkAnswer(query1, Row("x1_y1") :: Row("x2_y2") :: Nil)
+if (testCommandAvailable("bash") && testCommandAvailable("echo | awk 
'{print $0}'")) {
+  val df = Seq(("x1", "y1", "z1"), ("x2", "y2", "z2")).toDF("c1", 
"c2", "c3")
+  df.createOrReplaceTempView("script_table")
+  val query1 = sql(
+"""
+  |SELECT col1 FROM (from(SELECT c1, c2, c3 FROM script_table) 
tempt_table
+  |REDUCE c1, c2, c3 USING 'bash 
src/test/resources/test_script.sh' AS
+  |(col1 STRING, col2 STRING)) script_test_table""".stripMargin)
+  checkAnswer(query1, Row("x1_y1") :: Row("x2_y2") :: Nil)
+}
+else {
--- End diff --

Nit: unnecessary line break.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14169: [SPARK-16515][SQL]set default record reader and writer f...

2016-07-15 Thread adrian-wang

Github user adrian-wang commented on the issue:

https://github.com/apache/spark/pull/14169
  
@rxin Only those script transformation cases which use LazySimpleSerde 
would be affected.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14169: [SPARK-16515][SQL]set default record reader and writer f...

2016-07-14 Thread adrian-wang

Github user adrian-wang commented on the issue:

https://github.com/apache/spark/pull/14169
  
@rxin  In Spark 2.0, those conf values start with "hive.", which have 
default value in HiveConf, cannot get the default value now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14169: [WIP][SPARK-16515][SQL]set default record reader and wri...

2016-07-13 Thread adrian-wang

Github user adrian-wang commented on the issue:

https://github.com/apache/spark/pull/14169
  
I have updated my code and switch to use bash as test case. Hope it will 
work for Jenkins.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14169: [WIP][SPARK-16515][SQL]set default record reader and wri...

2016-07-12 Thread adrian-wang

Github user adrian-wang commented on the issue:

https://github.com/apache/spark/pull/14169
  
This is strange because I can pass the specific test on my local.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14169: [SPARK-16515][SQL]set default record reader and w...

2016-07-12 Thread adrian-wang

GitHub user adrian-wang opened a pull request:

https://github.com/apache/spark/pull/14169

[SPARK-16515][SQL]set default record reader and writer for script 
transformation

## What changes were proposed in this pull request?

In `ScriptInputOutputSchema`, we read default `RecordReader` and 
`RecordWriter` from conf. Since Spark 2.0 has deleted those config keys from 
hive conf, we have to set default reader/writer class name by ourselves.


## How was this patch tested?

added a test case in SQLQuerySuite.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian-wang/spark script

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14169.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14169


commit 671a2ba1e1e19fe2d6115c42b59644e619138c84
Author: Daoyuan Wang <daoyuan.w...@intel.com>
Date:   2016-07-12T13:29:08Z

set default record reader and writer for script transformation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14089: [SPARK-16415][SQL] fix catalog string error

2016-07-07 Thread adrian-wang

GitHub user adrian-wang opened a pull request:

https://github.com/apache/spark/pull/14089

[SPARK-16415][SQL] fix catalog string error

## What changes were proposed in this pull request?

In #13537 we truncate `simpleString` if it is a long `StructType`. But 
sometimes we need `catalogString` to reconstruct `TypeInfo`, for example in 
description of [SPARK-16415 
](https://issues.apache.org/jira/browse/SPARK-16415). So we need to keep the 
implementation of `catalogString` not affected by our truncate.


## How was this patch tested?

added a test case.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian-wang/spark catalogstring

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14089.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14089


commit dc81b385a68c842715b3377f15f7b3009e45f0ce
Author: Daoyuan Wang <daoyuan.w...@intel.com>
Date:   2016-07-06T03:18:07Z

fix catalog string

commit eb1018108a879a06701c3dff539ef8d10ab2b118
Author: Daoyuan Wang <daoyuan.w...@intel.com>
Date:   2016-07-07T11:41:39Z

add a unit test




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13783: [HOTFIX][SPARK-15613][SQL]Set test runtime timezo...

2016-06-26 Thread adrian-wang

Github user adrian-wang closed the pull request at:

https://github.com/apache/spark/pull/13783


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13783: [HOTFIX][SPARK-15613][SQL]Set test runtime timezone for ...

2016-06-26 Thread adrian-wang

Github user adrian-wang commented on the issue:

https://github.com/apache/spark/pull/13783
  
Closing, thx!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13784: [SPARK-16078] [SQL] from_utc_timestamp/to_utc_tim...

2016-06-20 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/13784#discussion_r67795207
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -886,23 +886,45 @@ object DateTimeUtils {
   }
 
   /**
+   * Convert the timestamp `ts` from one timezone to another.
+   *
+   * TODO: Because of DST, the conversion between UTC and human time is 
not exactly one-to-one
+   * mapping, the conversion here may return wrong result, we should make 
the timestamp
+   * timezone-aware.
+   */
+  def convertTz(ts: SQLTimestamp, fromZone: TimeZone, toZone: TimeZone): 
SQLTimestamp = {
+// We always use local timezone to parse or format a timestamp
+val localZone = threadLocalLocalTimeZone.get()
+val utcTs = if (fromZone.getID == localZone.getID) {
+  ts
+} else {
+  // get the human time using local time zone, that actually is in 
fromZone.
+  val localTs = ts + localZone.getOffset(ts / 1000L) * 1000L  // in 
fromZone
+  localTs - getOffsetFromLocalMillis(localTs / 1000L, fromZone) * 1000L
+}
+if (toZone.getID == localZone.getID) {
+  utcTs
+} else {
+  val localTs2 = utcTs + toZone.getOffset(utcTs / 1000L) * 1000L  // 
in toZone
+  // treat it as local timezone, convert to UTC (we could get the 
expected human time back)
+  localTs2 - getOffsetFromLocalMillis(localTs2 / 1000L, localZone) * 
1000L
+}
+  }
+
+  /**
* Returns a timestamp of given timezone from utc timestamp, with the 
same string
* representation in their timezone.
*/
   def fromUTCTime(time: SQLTimestamp, timeZone: String): SQLTimestamp = {
-val tz = TimeZone.getTimeZone(timeZone)
-val offset = tz.getOffset(time / 1000L)
-time + offset * 1000L
+convertTz(time, TimeZoneGMT, TimeZone.getTimeZone(timeZone))
--- End diff --

For `fromUTCTime`, this would result in a little bit overhead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13783: [HOTFIX][SPARK-15613][SQL]Set test runtime timezo...

2016-06-20 Thread adrian-wang

GitHub user adrian-wang opened a pull request:

https://github.com/apache/spark/pull/13783

[HOTFIX][SPARK-15613][SQL]Set test runtime timezone for DateTimeUtilSuite

## What changes were proposed in this pull request?

With not default timezone setting in DateTimeUtilsSuite and use 
`Timestamp.valueOf` method could cause problems in a different time zone from 
PST. This PR fix this reported in #13652 by @robbinspg and @lw-lin

## How was this patch tested?

Existed tests do that.


cc @davies 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian-wang/spark timezone

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13783.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13783


commit f5cea66b3bada0afa07b73691753bdc9c983e86d
Author: Daoyuan Wang <daoyuan.w...@intel.com>
Date:   2016-06-20T18:54:39Z

set test runtime timezone for DateTimeUtilSuite




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13652: [SPARK-15613] [SQL] Fix incorrect days to millis convers...

2016-06-20 Thread adrian-wang

Github user adrian-wang commented on the issue:

https://github.com/apache/spark/pull/13652
  
@lw-lin @robbinspg I submit a PR to fix the failure: #13783 . Thanks for 
your information!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13652: [SPARK-15613] [SQL] Fix incorrect days to millis ...

2016-06-15 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/13652#discussion_r67289086
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -851,6 +851,29 @@ object DateTimeUtils {
   }
 
   /**
+   * Lookup the offset for given millis seconds since 1970-01-01 00:00:00 
in a timezone.
--- End diff --

since 1970-01-01 00:00:00 UTC or local timezone?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...

2016-05-19 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/13186#issuecomment-220278544
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...

2016-05-19 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/13186#issuecomment-220250927
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...

2016-05-18 Thread adrian-wang

GitHub user adrian-wang opened a pull request:

https://github.com/apache/spark/pull/13186

[SPARK-15397] [SQL] fix string udf locate as hive

## What changes were proposed in this pull request?

in hive, `locate("aa", "aaa", 0)` would yield 0, `locate("aa", "aaa", 1)` 
would yield 1 and `locate("aa", "aaa", 2)` would yield 2, while in Spark, 
`locate("aa", "aaa", 0)` would yield 1,  `locate("aa", "aaa", 1)` would yield 2 
and  `locate("aa", "aaa", 2)` would yield 0. This results from the different 
understanding of the third parameter in udf `locate`. It means the starting 
index and starts from 1, so when we use 0, the return would always be 0.


## How was this patch tested?

tested with modified `StringExpressionsSuite` and `StringFunctionsSuite`




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian-wang/spark locate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13186.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13186


commit 23b43d4c837d762461dd56a62b85cb998919e0ef
Author: Daoyuan Wang <daoyuan.w...@intel.com>
Date:   2016-05-18T11:30:07Z

fix string udf locate as hive




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14021][SQL] custom context support for ...

2016-05-09 Thread adrian-wang

Github user adrian-wang closed the pull request at:

https://github.com/apache/spark/pull/11843


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14786] Remove hive-cli dependency from ...

2016-04-20 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/12551#issuecomment-212696187
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14631][SQL]drop database cascade should...

2016-04-15 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/12391#discussion_r59838034
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogSuite.scala
 ---
@@ -17,20 +17,22 @@
 
 package org.apache.spark.sql.hive
 
+import java.io.File
+
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.util.VersionInfo
 
 import org.apache.spark.SparkConf
+import org.apache.spark.sql.catalyst.FunctionIdentifier
 import org.apache.spark.sql.catalyst.catalog._
 import org.apache.spark.sql.hive.client.{HiveClient, IsolatedClientLoader}
-import org.apache.spark.util.Utils
 
 /**
  * Test suite for the [[HiveExternalCatalog]].
  */
 class HiveExternalCatalogSuite extends CatalogTestCases {
 
-  private val client: HiveClient = {
+  private lazy val client: HiveClient = {
--- End diff --

maybe `lazy` is better?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14631][SQL][WIP]drop database cascade s...

2016-04-14 Thread adrian-wang

GitHub user adrian-wang opened a pull request:

https://github.com/apache/spark/pull/12391

[SPARK-14631][SQL][WIP]drop database cascade should drop functions for 
HiveExternalCatalog

## What changes were proposed in this pull request?

as HIVE-12304, drop database cascade of hive did not drop functions as well 
(even after the fix, by call `client.dropDatabase` would still not drop 
functions). We need to fix this when call `dropDatabase` in 
`HiveExternalCatalog`.

Jira: https://issues.apache.org/jira/browse/SPARK-14631
See also: https://issues.apache.org/jira/browse/HIVE-12304


## How was this patch tested?

unit test in HiveExternalCatalogSuite.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian-wang/spark cascadedropdb

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12391.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12391


commit 66eeea0cce58ab65d06bee1220bbd61226505716
Author: Daoyuan Wang <daoyuan.w...@intel.com>
Date:   2016-04-14T11:44:09Z

drop database cascade should drop functions for HiveExternalCatalog




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14631][SQL][WIP]drop database cascade s...

2016-04-14 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/12391#issuecomment-209895482
  
@chenghao-intel 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14021][SQL][WIP] custom context support...

2016-03-19 Thread adrian-wang

GitHub user adrian-wang opened a pull request:

https://github.com/apache/spark/pull/11843

[SPARK-14021][SQL][WIP] custom context support for SparkSQLEnv

## What changes were proposed in this pull request?

This is to create a custom context for command `bin/spark-sql` and 
`sbin/start-thriftserver`. Any context that is derived from `HiveContext` is 
acceptable. User need to configure the class name of custom context in a config 
of `spark.sql.context.class`, and make sure the class in classpath. This is to 
provide a more elegant way for custom configurations and changes for 
infrastructure team.


## How was this patch tested?

Added a unit test in `CliSuite` to track this.

cc @chenghao-intel 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian-wang/spark customcontext

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11843.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11843


commit 654397e0f8ea3774ccce2c482784a986b212527b
Author: Daoyuan Wang <daoyuan.w...@intel.com>
Date:   2016-03-19T08:26:20Z

SPARK-14021 custom context support




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12789] [SQL] Support Order By Ordinal i...

2016-03-19 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/11815#discussion_r56745951
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -435,6 +435,11 @@ object SQLConf {
 defaultValue = Some(true),
 doc = "When false, we will treat bucketed table as normal table")
 
+  val ORDER_BY_ORDINAL = booleanConf("spark.sql.orderByOrdinal",
--- End diff --

Sounds good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12789] [SQL] Support Order By Ordinal i...

2016-03-19 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/11815#discussion_r56745738
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -435,6 +435,11 @@ object SQLConf {
 defaultValue = Some(true),
 doc = "When false, we will treat bucketed table as normal table")
 
+  val ORDER_BY_ORDINAL = booleanConf("spark.sql.orderByOrdinal",
--- End diff --

There's a similar configuration value in hive since 0.11 and later, which 
is ` hive.groupby.orderby.position.alias`(default as `false`).
See: 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy

I think we also need to read from ConfVar for this config.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12855][MINOR] remove spark.sql.dialect ...

2016-03-16 Thread adrian-wang

GitHub user adrian-wang opened a pull request:

https://github.com/apache/spark/pull/11758

[SPARK-12855][MINOR] remove spark.sql.dialect from doc and test

## What changes were proposed in this pull request?

Since developer API of plug-able parser has been removed in #10801 , docs 
should be updated accordingly.


## How was this patch tested?

This patch will not affect the real code path.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian-wang/spark spark12855

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11758.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11758


commit b0ffbf6fd81e30a6df9d08b020fc1215e6dcc7d4
Author: Daoyuan Wang <daoyuan.w...@intel.com>
Date:   2016-03-16T09:01:56Z

[MINOR] remove spark.sql.dialect from doc and test




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR][TEST][SQL] Remove wrong "expected" par...

2016-03-15 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/11718#issuecomment-196712224
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13427][SQL] Support USING clause in JOI...

2016-03-14 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/11297#discussion_r56101451
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1169,6 +1170,7 @@ object PushPredicateThroughJoin extends 
Rule[LogicalPlan] with PredicateHelper {
   Join(newLeft, newRight, LeftOuter, newJoinCond)
 case FullOuter => f
 case NaturalJoin(_) => sys.error("Untransformed NaturalJoin node")
+case UsingJoin(_, _) => sys.error("Untransformed Using join node")
--- End diff --

JoinType is sealed, so we need to put something in this pattern match


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13648] Add Hive Cli to classes for isol...

2016-03-08 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/11495#issuecomment-193670192
  
Why we need `hive-cli` here? This is not version specific.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11624][SPARK-11972][SQL]fix commands th...

2016-02-19 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/9589#discussion_r53545337
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -104,29 +104,39 @@ private[hive] class HiveClientImpl(
 }
 
 val ret = try {
-  val initialConf = new HiveConf(classOf[SessionState])
-  // HiveConf is a Hadoop Configuration, which has a field of 
classLoader and
-  // the initial value will be the current thread's context class 
loader
-  // (i.e. initClassLoader at here).
-  // We call initialConf.setClassLoader(initClassLoader) at here to 
make
-  // this action explicit.
-  initialConf.setClassLoader(initClassLoader)
-  config.foreach { case (k, v) =>
-if (k.toLowerCase.contains("password")) {
-  logDebug(s"Hive Config: $k=xxx")
-} else {
-  logDebug(s"Hive Config: $k=$v")
+  // originState will be created if not exists, will never be null
+  val originalState = SessionState.get()
+  if (originalState.isInstanceOf[CliSessionState]) {
--- End diff --

the `SessionState.get()` method would create a instance of `SessionState` 
if not exists.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11624][SPARK-11972][SQL]fix commands th...

2016-02-17 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/9589#issuecomment-185523881
  
@marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...

2016-02-16 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/11212#discussion_r53115148
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java
 ---
@@ -170,6 +170,7 @@ public void write(int ordinal, double value) {
   }
 
   public void write(int ordinal, Decimal input, int precision, int scale) {
+input = input.clone();
--- End diff --

Here we'll call `changePrecision` on `input` here, which would affect the 
orignal data. I agree that this is a bad idea, maybe we need to propose a 
separate pr to work around this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...

2016-02-15 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/11212#discussion_r52968764
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -523,11 +523,45 @@ case class Atan2(left: Expression, right: Expression)
 
 case class Pow(left: Expression, right: Expression)
   extends BinaryMathExpression(math.pow, "POWER") {
-  override def genCode(ctx: CodegenContext, ev: ExprCode): String = {
-defineCodeGen(ctx, ev, (c1, c2) => s"java.lang.Math.pow($c1, $c2)")
-  }
-}
+  override def inputTypes: Seq[AbstractDataType] = Seq(NumericType, 
NumericType)
+
+  override def dataType: DataType = (left.dataType, right.dataType) match {
+case (dt: DecimalType, ByteType | ShortType | IntegerType) => dt
+case _ => DoubleType
+  }
+
+  protected override def nullSafeEval(input1: Any, input2: Any): Any =
+(left.dataType, right.dataType) match {
+  case (dt: DecimalType, ByteType) =>
+input1.asInstanceOf[Decimal].pow(input2.asInstanceOf[Byte])
+  case (dt: DecimalType, ShortType) =>
+input1.asInstanceOf[Decimal].pow(input2.asInstanceOf[Short])
+  case (dt: DecimalType, IntegerType) =>
+input1.asInstanceOf[Decimal].pow(input2.asInstanceOf[Int])
+  case (dt: DecimalType, FloatType) =>
+math.pow(input1.asInstanceOf[Decimal].toDouble, 
input2.asInstanceOf[Float])
+  case (dt: DecimalType, DoubleType) =>
+math.pow(input1.asInstanceOf[Decimal].toDouble, 
input2.asInstanceOf[Double])
+  case (dt1: DecimalType, dt2: DecimalType) =>
+math.pow(input1.asInstanceOf[Decimal].toDouble, 
input2.asInstanceOf[Decimal].toDouble)
--- End diff --

Shall we cast the result of `math.pow` back to `DecimalType` for these 
three cases?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...

2016-02-15 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/11212#discussion_r52968376
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java
 ---
@@ -170,6 +170,7 @@ public void write(int ordinal, double value) {
   }
 
   public void write(int ordinal, Decimal input, int precision, int scale) {
+input = input.clone();
--- End diff --

Better add a comment that explains why we need to clone before write.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...

2016-02-15 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/11212#discussion_r52968287
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathFunctionsSuite.scala
 ---
@@ -351,6 +350,20 @@ class MathFunctionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 
   test("pow") {
+testBinary(Pow, (d: Decimal, n: Byte) => d.pow(n),
+  (-5 to 5).map(v => (Decimal(v * 1.0), v.toByte)))
--- End diff --

maybe `v.toDouble` is better


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SQL] Decimal datatype support for pow

2016-02-15 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/11212#discussion_r52968214
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathFunctionsSuite.scala
 ---
@@ -103,8 +103,7 @@ class MathFunctionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 } else {
   domain.foreach { case (v1, v2) =>
-checkEvaluation(c(Literal(v1), Literal(v2)), f(v1 + 0.0, v2 + 
0.0), EmptyRow)
-checkEvaluation(c(Literal(v2), Literal(v1)), f(v2 + 0.0, v1 + 
0.0), EmptyRow)
+checkEvaluation(c(Literal(v1), Literal(v2)), f(v1, v2), EmptyRow)
--- End diff --

keep the test of `f(v2, v1)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11624][SPARK-11972][SQL]fix commands th...

2016-02-13 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/9589#issuecomment-183811059
  
@marmbrus We have instantiated and started a instance of `CliSessionState`, 
and when we init `SparkSQLEnv`, we will create a `SessionState`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-13 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-183791219
  
@srowen you are right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/11071#discussion_r51851087
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -55,10 +56,19 @@ object DateTimeUtils {
   // this is year -17999, calculation: 50 * daysIn400Year
   final val YearZero = -17999
   final val toYearZero = to2001 + 7304850
-  final val TimeZoneGMT = TimeZone.getTimeZone("GMT")
 
   @transient lazy val defaultTimeZone = TimeZone.getDefault
 
+  // Reuse the TimeZone object as it is expensive to create in each method 
call.
+  final val timeZones = new ConcurrentHashMap[String, TimeZone]
--- End diff --

This map could be quite big, because the string varies. Actually 
`ZoneInfoFile` does provide a cache for different `ID`s. Let's find out whether 
the boost you mentioned comes from reusing `TimeZone` or `Calendar` instances.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/11071#discussion_r51851099
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -55,10 +56,19 @@ object DateTimeUtils {
   // this is year -17999, calculation: 50 * daysIn400Year
   final val YearZero = -17999
   final val toYearZero = to2001 + 7304850
-  final val TimeZoneGMT = TimeZone.getTimeZone("GMT")
 
   @transient lazy val defaultTimeZone = TimeZone.getDefault
 
+  // Reuse the TimeZone object as it is expensive to create in each method 
call.
+  final val timeZones = new ConcurrentHashMap[String, TimeZone]
--- End diff --

This map could be quite big, because the string varies. Actually 
`ZoneInfoFile` does provide a cache for different `ID`s. Let's find out whether 
the boost you mentioned comes from reusing `TimeZone` or `Calendar` instances.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/11071#discussion_r51852765
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -55,10 +56,19 @@ object DateTimeUtils {
   // this is year -17999, calculation: 50 * daysIn400Year
   final val YearZero = -17999
   final val toYearZero = to2001 + 7304850
-  final val TimeZoneGMT = TimeZone.getTimeZone("GMT")
 
   @transient lazy val defaultTimeZone = TimeZone.getDefault
 
+  // Reuse the TimeZone object as it is expensive to create in each method 
call.
+  final val timeZones = new ConcurrentHashMap[String, TimeZone]
--- End diff --

By use this map we can skip a lot of calls to `getTimeZone`, which is a 
synchronized method, `ConcurrentHashMap` can help improve performance, that's 
true. Do we need add a `transient`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/11071#discussion_r51853631
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -55,10 +56,19 @@ object DateTimeUtils {
   // this is year -17999, calculation: 50 * daysIn400Year
   final val YearZero = -17999
   final val toYearZero = to2001 + 7304850
-  final val TimeZoneGMT = TimeZone.getTimeZone("GMT")
 
   @transient lazy val defaultTimeZone = TimeZone.getDefault
 
+  // Reuse the TimeZone object as it is expensive to create in each method 
call.
+  final val timeZones = new ConcurrentHashMap[String, TimeZone]
--- End diff --

Actually, we only need to change all `getTimeZone(String ID)` to 
`getTimeZone(String ID, boolean fallback)`, (use true as fallback) to 
workaround the synchronized tag here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...

2016-02-04 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/11071#issuecomment-180201265
  
The map key could be like "UTC+01:00". "American/Los Angeles", "PST", etc., 
they are already cached in `getTimeZone`, but the method itself is a 
synchronized one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12828][SQL] Natural join follow-up

2016-02-03 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/11070#issuecomment-179664171
  
LGTM, thx!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support

2016-02-03 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/10762#issuecomment-179642503
  
@cloud-fan any more comments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13056][SQL] map column would throw NPE ...

2016-02-01 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10964#discussion_r51392316
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2046,6 +2046,15 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 )
   }
 
+  test("SPARK-13056: Null in map value causes NPE") {
+val df = Seq(1 -> Map("abc" -> "somestring", "cba" -> 
null)).toDF("key", "value")
+withTempTable("maptest") {
+  df.registerTempTable("maptest")
+  checkAnswer(sql("SELECT value['abc'] FROM maptest where key = 1"), 
Row("somestring"))
--- End diff --

Yes, you are right


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13056][SQL] map column would throw NPE ...

2016-02-01 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/10964#issuecomment-177877914
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13056][SQL] map column would throw NPE ...

2016-01-31 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10964#discussion_r51381272
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2056,4 +2056,11 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   )
 }
   }
+
+  test("SPARK-13056: Null in map value causes NPE") {
+val df = Seq(1 -> Map("abc" -> "somestring", "cba" -> 
null)).toDF("key", "value")
+df.registerTempTable("maptest")
+checkAnswer(sql("SELECT value['abc'] FROM maptest"), Row("somestring"))
+checkAnswer(sql("SELECT value['cba'] FROM maptest"), Row(null))
--- End diff --

@tejasapatil OK, actually I used a udf to generate the map and got your 
exception, later I changed to this following the guide from @cloud-fan but 
didn't verify it myself. I'll modify the test case here, Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13056][SQL] map column would throw NPE ...

2016-01-29 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10964#discussion_r51240308
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -1463,4 +1463,11 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
 |FROM (SELECT '{"f1": "value1", "f2": 12}' json, 'hello' as str) 
test
   """.stripMargin), Row("value1", "12", BigDecimal("3.14"), "hello"))
   }
+
+  test("SPARK-13056: Null in map value causes NPE") {
+val df = Seq(1 -> Map("abc" -> "somestring", "cba" -> 
null)).toDF("key", "value")
+df.registerTempTable("maptest")
--- End diff --

That suite mainly test df api functionality, I think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support

2016-01-29 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10762#discussion_r51295569
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -171,13 +171,21 @@ case class Join(
 
   def selfJoinResolved: Boolean = 
left.outputSet.intersect(right.outputSet).isEmpty
 
-  // Joins are only resolved if they don't introduce ambiguous expression 
ids.
-  override lazy val resolved: Boolean = {
+  // if not a natural join, it is resolved. if it is a natural join, we 
still need to
+  // eliminate natural before we mark it resolved, but the node should be 
ready for
+  // resolution only if everything else is resolved here.
+  lazy val resolvedExceptNatural: Boolean = {
 childrenResolved &&
   expressions.forall(_.resolved) &&
   selfJoinResolved &&
   condition.forall(_.dataType == BooleanType)
   }
+
+  // Joins are only resolved if they don't introduce ambiguous expression 
ids.
--- End diff --

This is original comment from the old Join.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support

2016-01-28 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10762#discussion_r51090849
  
--- Diff: 
sql/catalyst/src/main/antlr3/org/apache/spark/sql/catalyst/parser/SparkSqlLexer.g
 ---
@@ -328,6 +328,8 @@ KW_WEEK: 'WEEK'|'WEEKS';
 KW_MILLISECOND: 'MILLISECOND'|'MILLISECONDS';
 KW_MICROSECOND: 'MICROSECOND'|'MICROSECONDS';
 
+KW_NATURAL: 'NATURAL';
--- End diff --

I think it should be reserved.
See:
MySQL: https://dev.mysql.com/doc/refman/5.5/en/keywords.html
PostgreSQL: 
http://www.postgresql.org/docs/7.3/static/sql-keywords-appendix.html

and it's also reserved in SQL92 and SQL99(as declared in the doc of 
PostgreSQL).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support

2016-01-28 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10762#discussion_r51094651
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -104,6 +105,9 @@ trait CheckAnalysis {
   s"filter expression '${f.condition.prettyString}' " +
 s"of type ${f.condition.dataType.simpleString} is not a 
boolean.")
 
+  case j @ Join(_, _, NaturalJoin(_), _) =>
+failAnalysis(s"natural join not resolved.")
--- End diff --

This will never happen but when there were remaining natural joins after 
100 iterations


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support

2016-01-28 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10762#discussion_r51106564
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -919,6 +919,7 @@ object PushPredicateThroughJoin extends 
Rule[LogicalPlan] with PredicateHelper {
   (rightFilterConditions ++ commonFilterCondition).
 reduceLeftOption(And).map(Filter(_, 
newJoin)).getOrElse(newJoin)
 case FullOuter => f // DO Nothing for Full Outer Join
+case NaturalJoin(_) => sys.error("Untransformed NaturalJoin node")
--- End diff --

Yes, but joinType  is a sealed abstract class, so we have to put something 
here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13056][SQL] map column would throw NPE ...

2016-01-28 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10964#discussion_r51226292
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala
 ---
@@ -218,7 +218,7 @@ case class GetArrayItem(child: Expression, ordinal: 
Expression)
   protected override def nullSafeEval(value: Any, ordinal: Any): Any = {
 val baseValue = value.asInstanceOf[ArrayData]
 val index = ordinal.asInstanceOf[Number].intValue()
-if (index >= baseValue.numElements() || index < 0) {
+if (index >= baseValue.numElements() || index < 0 || 
baseValue.isNullAt(index)) {
   null
 } else {
   baseValue.get(index, dataType)
--- End diff --

Just realized it is ok to let it return null here, without the condition 
above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support

2016-01-28 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/10762#issuecomment-176532100
  
Acutally MySQL and Oracle does not support normal full outer join either.
PostgreSQL does support natural full outer join: 
http://www.postgresql.org/docs/9.1/static/queries-table-expressions.html
I think we should support natural full outer join since we support full 
outer join.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support

2016-01-28 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10762#discussion_r51226827
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala
 ---
@@ -248,4 +249,29 @@ class AnalysisSuite extends AnalysisTest {
 val plan = relation.select(CaseWhen(Seq((Literal(true), 'a.attr)), 
'b).as("val"))
 assertAnalysisSuccess(plan)
   }
+
+  test("check resolve of natural join") {
--- End diff --

OK, I'll update it soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support

2016-01-27 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/10762#issuecomment-176006912
  
Sure, I'll update this pr today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13056][SQL] map column would throw NPE ...

2016-01-27 Thread adrian-wang

GitHub user adrian-wang opened a pull request:

https://github.com/apache/spark/pull/10964

[SPARK-13056][SQL] map column would throw NPE if value is null

Jira:
https://issues.apache.org/jira/browse/SPARK-13056

Create a map like
{ "a": "somestring", "b": null}
Query like
SELECT col["b"] FROM t1;
NPE would be thrown.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/adrian-wang/spark npewriter

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10964.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10964


commit 5b2d942c3f9825eca0b286c42f31a51e6bda0cdd
Author: Daoyuan Wang <daoyuan.w...@intel.com>
Date:   2016-01-28T05:49:45Z

map column would throw NPE if value is null




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support

2016-01-21 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/10762#issuecomment-173804613
  
When the parser calls the constructor, how can we get the schema of tables? 
We need schema to build project list and conditions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support

2016-01-20 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/10762#issuecomment-173465114
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12789]Support order by index

2016-01-18 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10731#discussion_r50063958
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -445,6 +445,26 @@ class Analyzer(
 val newOrdering = resolveSortOrders(ordering, child, throws = 
false)
 Sort(newOrdering, global, child)
 
+ // Resolve the order index to be a specific column
+  case s @ Sort(ordering, global, child) if child.resolved && 
s.resolved =>
+def indexToColumn(index: Int, direction: SortDirection) = {
+  val orderNodes = child.output
+  if (index > 0 && index <= orderNodes.size) {
+SortOrder(orderNodes(index - 1), direction)
+  } else {
+throw new UnresolvedException(s,
+  s"""Order by position: $index does not exist \n
+  |The Select List is indexed from 1 to 
${orderNodes.size}""".stripMargin)
+  }
+}
+val newOrdering = ordering map {
--- End diff --

If there are nested attributes, integer literal could be used to reference 
nested fields, so we the integer literal should be on top level of order list.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12789]Support order by index

2016-01-17 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/10731#issuecomment-172424340
  
order by 2 should be the second column, I think


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support

2016-01-14 Thread adrian-wang

GitHub user adrian-wang opened a pull request:

https://github.com/apache/spark/pull/10762

[SPARK-12828][SQL]add natural join support

Jira:
https://issues.apache.org/jira/browse/SPARK-12828


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian-wang/spark naturaljoin

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10762.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10762


commit 91a4a8739673740792c0e2c95ad7ae37497b7a8b
Author: Daoyuan Wang <daoyuan.w...@intel.com>
Date:   2016-01-14T08:31:00Z

add natural join support




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support

2016-01-14 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10762#discussion_r49813610
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala ---
@@ -179,10 +180,15 @@ object SqlParser extends AbstractSparkSQLParser with 
DataTypeParser {
 )
 
   protected lazy val joinedRelation: Parser[LogicalPlan] =
-relationFactor ~ rep1(joinType.? ~ (JOIN ~> relationFactor) ~ 
joinConditions.?) ^^ {
+relationFactor ~
--- End diff --

Yes, I will update the hive parser today, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support

2016-01-14 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10762#discussion_r49825914
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2067,4 +2067,24 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   )
 }
   }
+
+  test("natural join") {
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support

2016-01-14 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10762#discussion_r49825919
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala ---
@@ -179,10 +180,15 @@ object SqlParser extends AbstractSparkSQLParser with 
DataTypeParser {
 )
 
   protected lazy val joinedRelation: Parser[LogicalPlan] =
-relationFactor ~ rep1(joinType.? ~ (JOIN ~> relationFactor) ~ 
joinConditions.?) ^^ {
+relationFactor ~
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support

2016-01-14 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10762#discussion_r49828096
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -144,6 +147,35 @@ case class Join(
 }
   }
 
+  def outerProjectList: Seq[NamedExpression] = {
--- End diff --

yes I mean "column"...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support

2016-01-14 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10762#discussion_r49827571
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -144,6 +147,35 @@ case class Join(
 }
   }
 
+  def outerProjectList: Seq[NamedExpression] = {
--- End diff --

For select * from a natural join, we need to use a Project to get rid of 
redundant columns.

In mysql, select * from natural join would only contain one row if both 
size have the row of the same name. but user do can use something like `t1.a` 
or `t2.a` to explicitly reference the column here. Do you think we should 
implement it exactly the same? I'm afraid it would be a too complicated change, 
but I can do that if necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support

2016-01-14 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/10762#issuecomment-171895087
  
@rxin, Thanks for you time, I'll draft another version accordingly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12789]Support order by index

2016-01-12 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10731#discussion_r49539497
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -22,8 +22,9 @@ import java.sql.Timestamp
 
 import org.apache.spark.AccumulatorSuite
 import org.apache.spark.sql.catalyst.DefaultParserDialect
-import org.apache.spark.sql.catalyst.analysis.FunctionRegistry
+import org.apache.spark.sql.catalyst.analysis.{UnresolvedException, 
FunctionRegistry}
--- End diff --

import order


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12789]Support order by index

2016-01-12 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10731#discussion_r49539533
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -441,7 +442,7 @@ class Analyzer(
 
   // When resolve `SortOrder`s in Sort based on child, don't report 
errors as
   // we still have chance to resolve it based on grandchild
-  case s @ Sort(ordering, global, child) if child.resolved && 
!s.resolved =>
--- End diff --

why remove it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12789]Support order by index

2016-01-12 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10731#discussion_r49539463
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -22,8 +22,9 @@ import java.sql.Timestamp
 
 import org.apache.spark.AccumulatorSuite
 import org.apache.spark.sql.catalyst.DefaultParserDialect
-import org.apache.spark.sql.catalyst.analysis.FunctionRegistry
+import org.apache.spark.sql.catalyst.analysis.{UnresolvedException, 
FunctionRegistry}
 import org.apache.spark.sql.catalyst.errors.DialectException
+import org.apache.spark.sql.catalyst.expressions.SortOrder
--- End diff --

unused?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12789]Support order by index

2016-01-12 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10731#discussion_r49540540
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -22,8 +22,9 @@ import java.sql.Timestamp
 
 import org.apache.spark.AccumulatorSuite
 import org.apache.spark.sql.catalyst.DefaultParserDialect
-import org.apache.spark.sql.catalyst.analysis.FunctionRegistry
+import org.apache.spark.sql.catalyst.analysis.{UnresolvedException, 
FunctionRegistry}
 import org.apache.spark.sql.catalyst.errors.DialectException
+import org.apache.spark.sql.catalyst.expressions.SortOrder
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11624][SPARK-11972][SQL]fix commands th...

2016-01-05 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/9589#discussion_r48829056
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala ---
@@ -151,29 +152,34 @@ private[hive] class ClientWrapper(
 // Switch to the initClassLoader.
 Thread.currentThread().setContextClassLoader(initClassLoader)
 val ret = try {
-  val initialConf = new HiveConf(classOf[SessionState])
-  // HiveConf is a Hadoop Configuration, which has a field of 
classLoader and
-  // the initial value will be the current thread's context class 
loader
-  // (i.e. initClassLoader at here).
-  // We call initialConf.setClassLoader(initClassLoader) at here to 
make
-  // this action explicit.
-  initialConf.setClassLoader(initClassLoader)
-  config.foreach { case (k, v) =>
-if (k.toLowerCase.contains("password")) {
-  logDebug(s"Hive Config: $k=xxx")
-} else {
-  logDebug(s"Hive Config: $k=$v")
+  val registeredState = SessionState.get
+  if (registeredState != null && 
registeredState.isInstanceOf[CliSessionState]) {
--- End diff --

When we have a `CliSessionState`, we are using Spark SQL CLI, in this case 
we never need second `SessionState` here. Creating another `SessionState` would 
fail some cases since `CliSessionState` is inherited from `SessionState`, which 
could lead to `ClassCastException`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-12260][wip][Streaming]Graceful Shutdown...

2015-12-22 Thread adrian-wang

Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/10252#discussion_r48232238
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala ---
@@ -330,6 +330,20 @@ abstract class DStream[T: ClassTag] (
   }
 
   /**
+   * compute the last valid time till input time
--- End diff --

Computes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 >

1 - 100 of 794 matches

Mail list logo