[GitHub] spark issue #23005: [SPARK-26005] [SQL] Upgrade ANTRL from 4.7 to 4.7.1

2018-11-11 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/23005
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...

2018-11-11 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22962
  
cc @jiangxb1987 @MrBago 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23005: [SPARK-26005] [SQL] Upgrade ANTRL from 4.7 to 4.7...

2018-11-11 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/23005

[SPARK-26005] [SQL] Upgrade ANTRL from 4.7 to 4.7.1

## What changes were proposed in this pull request?
Based on the release description of ANTRL 4.7.1., 
https://github.com/antlr/antlr4/releases, let us upgrade our parser to 4.7.1.

## How was this patch tested?
N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark upgradeAntlr4.7

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23005.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23005


commit 71b45f0a5abfd37dfa9fb1ab20e79ce8e06017f6
Author: gatorsmile 
Date:   2018-11-11T08:48:29Z

fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21766: [SPARK-24803][SQL] add support for numeric

2018-11-10 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21766
  
@wangtao605 Do you mind documenting our behavior in our Spark SQL doc?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17648: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...

2018-11-10 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17648
  
@ptkool Thanks for your contribution! This feature will be available in the 
next release. Spark 3.0


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21679: [SPARK-24695] [SQL]: To add support to return Calendar i...

2018-11-10 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21679
  
For your information, @cloud-fan and I are discussing how we expose 
calendar interval data types. Will post the proposal later. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

2018-11-09 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22932
  
LGTM. Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

2018-11-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22932#discussion_r232443802
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOutputWriter.scala
 ---
@@ -36,11 +37,17 @@ private[orc] class OrcOutputWriter(
   private[this] val serializer = new OrcSerializer(dataSchema)
 
   private val recordWriter = {
-new OrcOutputFormat[OrcStruct]() {
+val orcOutputFormat = new OrcOutputFormat[OrcStruct]() {
   override def getDefaultWorkFile(context: TaskAttemptContext, 
extension: String): Path = {
 new Path(path)
   }
-}.getRecordWriter(context)
+}
+val filename = orcOutputFormat.getDefaultWorkFile(context, ".orc")
+val options = 
OrcMapRedOutputFormat.buildOptions(context.getConfiguration)
+val writer = OrcFile.createWriter(filename, options)
+val recordWriter = new OrcMapreduceRecordWriter[OrcStruct](writer)
--- End diff --

This is basically copied from getRecordWriter


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22759: [MINOR][SQL][DOC] Correct parquet nullability doc...

2018-11-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22759#discussion_r232441937
  
--- Diff: docs/sql-programming-guide.md ---
@@ -706,7 +706,7 @@ data across a fixed number of buckets and can be used 
when a number of unique va
 
 [Parquet](http://parquet.io) is a columnar format that is supported by 
many other data processing systems.
 Spark SQL provides support for both reading and writing Parquet files that 
automatically preserves the schema
-of the original data. When writing Parquet files, all columns are 
automatically converted to be nullable for
+of the original data. When reading Parquet files, all columns are 
automatically converted to be nullable for
--- End diff --

This file has been re-org . Could you merge the latest master?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22759: [MINOR][SQL][DOC] Correct parquet nullability documentat...

2018-11-09 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22759
  
LGTM

Could you do us a favor to add the test cases for ensuring that the 
generated parquet files have a correct nullability value?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

2018-11-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22932#discussion_r232424657
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala ---
@@ -274,6 +278,15 @@ private[orc] class OrcOutputWriter(
 
   override def close(): Unit = {
 if (recordWriterInstantiated) {
+  // Hive 1.2.1 ORC initializes its private `writer` field at the 
first write.
+  try {
+val writerField = recordWriter.getClass.getDeclaredField("writer")
+writerField.setAccessible(true)
+val writer = writerField.get(recordWriter).asInstanceOf[Writer]
+writer.addUserMetadata(SPARK_VERSION_METADATA_KEY, 
UTF_8.encode(SPARK_VERSION_SHORT))
+  } catch {
+case NonFatal(e) => log.warn(e.toString, e)
+  }
--- End diff --

The same comment here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

2018-11-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22932#discussion_r232424626
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOutputWriter.scala
 ---
@@ -36,11 +41,17 @@ private[orc] class OrcOutputWriter(
   private[this] val serializer = new OrcSerializer(dataSchema)
 
   private val recordWriter = {
-new OrcOutputFormat[OrcStruct]() {
+val orcOutputFormat = new OrcOutputFormat[OrcStruct]() {
   override def getDefaultWorkFile(context: TaskAttemptContext, 
extension: String): Path = {
 new Path(path)
   }
-}.getRecordWriter(context)
+}
+val filename = orcOutputFormat.getDefaultWorkFile(context, ".orc")
+val options = 
OrcMapRedOutputFormat.buildOptions(context.getConfiguration)
+val writer = OrcFile.createWriter(filename, options)
+val recordWriter = new OrcMapreduceRecordWriter[OrcStruct](writer)
+writer.addUserMetadata(SPARK_VERSION_METADATA_KEY, 
UTF_8.encode(SPARK_VERSION_SHORT))
--- End diff --

Could we create a separate function for adding these metadata?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22987: [SPARK-25979][SQL] Window function: allow parentheses ar...

2018-11-09 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22987
  
LGTM

Thanks! Merged to master. 

I accidentally merged it to 2.4. Feel free to revert it if anybody has a 
concern about this fix.  



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22990: [SPARK-25988] [SQL] Keep names unchanged when deduplicat...

2018-11-09 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22990
  
Thanks! Merged to master/2.4


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22990: [SPARK-25988] [SQL] Keep names unchanged when deduplicat...

2018-11-08 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22990
  
cc @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22990: [SPARK-25988] [SQL] Keep names unchanged when ded...

2018-11-08 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/22990

[SPARK-25988] [SQL] Keep names unchanged when deduplicating the column 
names in Analyzer

## What changes were proposed in this pull request?
When the queries do not use the column names with the same case, users 
might hit various errors. Below is a typical test failure they can hit.
```
Expected only partition pruning predicates: 
ArrayBuffer(isnotnull(tdate#237), (cast(tdate#237 as string) >= 2017-08-15));
org.apache.spark.sql.AnalysisException: Expected only partition pruning 
predicates: ArrayBuffer(isnotnull(tdate#237), (cast(tdate#237 as string) >= 
2017-08-15));
at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.prunePartitionsByFilter(ExternalCatalogUtils.scala:146)
at 
org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.listPartitionsByFilter(InMemoryCatalog.scala:560)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:925)
```

## How was this patch tested?
Added two test cases.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark fix1283

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22990.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22990


commit 5e9f6f345b93d3370906c7b2d73ede15f4089c29
Author: gatorsmile 
Date:   2018-11-09T05:27:37Z

fix

commit 17b725c79ad602df20c44cacb92e7c6abd84cdda
Author: gatorsmile 
Date:   2018-11-09T05:33:58Z

fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22987: [SPARK-25979][SQL] Window function: allow parentheses ar...

2018-11-08 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22987
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22987: [SPARK-25979][SQL] Window function: allow parenth...

2018-11-08 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22987#discussion_r232136609
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/window.sql ---
@@ -109,3 +109,9 @@ last_value(false, false) OVER w AS 
last_value_contain_null
 FROM testData
 WINDOW w AS ()
 ORDER BY cate, val;
+
+-- parentheses around window reference
+SELECT cate, sum(val) OVER (w)
+FROM testData
+WHERE val is not null
+WINDOW w AS (PARTITION BY cate ORDER BY val);
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22969: [SPARK-22827][SQL][FOLLOW-UP] Throw `SparkOutOfMe...

2018-11-08 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22969#discussion_r232112326
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 ---
@@ -787,7 +789,7 @@ case class HashAggregateExec(
  |$unsafeRowKeys, ${hashEval.value});
  |  if ($unsafeRowBuffer == null) {
  |// failed to allocate the first page
- |throw new OutOfMemoryError("No enough memory for 
aggregation");
--- End diff --

opened a JIRA for banning this by a new lint rule: 
https://issues.apache.org/jira/browse/SPARK-25986


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21602: [SPARK-24613][SQL] Cache with UDF could not be ma...

2018-11-08 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21602#discussion_r232107563
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ---
@@ -142,7 +142,7 @@ class CacheManager extends Logging {
 // Remove the cache entry before we create a new one, so that we 
can have a different
 // physical plan.
 it.remove()
-val plan = spark.sessionState.executePlan(cd.plan).executedPlan
+val plan = 
spark.sessionState.executePlan(AnalysisBarrier(cd.plan)).executedPlan
--- End diff --

We need a test case to cover this change. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22949: [minor] update known_translations

2018-11-06 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22949
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22947: [SPARK-24913][SQL] Make AssertNotNull and AssertT...

2018-11-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22947#discussion_r231230211
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ---
@@ -66,6 +66,8 @@ case class AssertTrue(child: Expression) extends 
UnaryExpression with ImplicitCa
 
   override def nullable: Boolean = true
 
+  override lazy val deterministic: Boolean = false
--- End diff --


https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFAssertTrue.java#L45
 Hive also marks it as non-deterministic. 

cc @maryannxue  too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21596: [SPARK-24601] Update Jackson to 2.9.6

2018-11-05 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21596
  
It sounds like Jackson have an important behavior change that might impact 
us. 
```Scala
import com.fasterxml.jackson.annotation.JsonInclude.Include
import com.fasterxml.jackson.databind.{DeserializationFeature, ObjectMapper}
import com.fasterxml.jackson.module.scala.DefaultScalaModule
import com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper

val mapper = new ObjectMapper with ScalaObjectMapper
mapper.setSerializationInclusion(Include.NON_NULL)
mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
mapper.registerModule(DefaultScalaModule)

println(mapper.writeValueAsString(Map("abc" -> null)))
```

Previously, it outputs `{"abc":null}`, but now it is `{}`, unless we 
explicitly set it to `Include.ALWAYS`. The upgrade looks risky to me. 




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22949: [minor] update known_translations

2018-11-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22949#discussion_r230858019
  
--- Diff: dev/create-release/known_translations ---
@@ -203,3 +203,61 @@ shenh062326 - Shen Hong
 aokolnychyi - Anton Okolnychyi
 linbojin - Linbo Jin
 lw-lin - Liwei Lin
+10110346 - Xian Liu
+Achuth17 - Achuth Narayan Rajagopal
+Adamyuanyuan - Adam Wang
+DylanGuedes - Dylan Guedes
+JiahuiJiang - Jiahui Jiang
+KevinZwx - Kevin Zhang
+LantaoJin - Lantao Jin
+Lemonjing - Rann Tao
+LucaCanali - Luca Canali
+XD-DENG - Xiaodong Deng
+aai95 - Aleksei Izmalkin
+akonopko - Alexander Konopko
+ankuriitg - Ankur Gupta
+arucard21 - Riaas Mokiem
+attilapiros - Attila Zsolt Piros
+bravo-zhang - Bravo Zhang
+caneGuy - Kang Zhou
+chaoslawful - Xiaozhe Wang
+cluo512 - Chuan Luo
+codeatri - Neha Patil
+crafty-coder - Carlos Pena
+debugger87 - Chaozhong Yang
+e-dorigatti - Emilio Dorigatti
+eric-maynard - Eric Maynard
+felixalbani - Felix Albani
+fjh100456 - fjh100456
+guoxiaolongzte - Xiaolong Guo
+heary-cao - Xuewen Cao
+huangweizhe123 - Weizhe Huang
+ivoson - Tengfei Huang
+jinxing64 - Jin Xing
+liu-zhaokun - Zhaokun Liu
--- End diff --

Also add
> KaiXinXiaoLei - LeiLei Hu


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22949: [minor] update known_translations

2018-11-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22949#discussion_r230856955
  
--- Diff: dev/create-release/known_translations ---
@@ -203,3 +203,61 @@ shenh062326 - Shen Hong
 aokolnychyi - Anton Okolnychyi
 linbojin - Linbo Jin
 lw-lin - Liwei Lin
+10110346 - Xian Liu
+Achuth17 - Achuth Narayan Rajagopal
+Adamyuanyuan - Adam Wang
+DylanGuedes - Dylan Guedes
+JiahuiJiang - Jiahui Jiang
+KevinZwx - Kevin Zhang
+LantaoJin - Lantao Jin
+Lemonjing - Rann Tao
+LucaCanali - Luca Canali
+XD-DENG - Xiaodong Deng
+aai95 - Aleksei Izmalkin
+akonopko - Alexander Konopko
+ankuriitg - Ankur Gupta
+arucard21 - Riaas Mokiem
+attilapiros - Attila Zsolt Piros
+bravo-zhang - Bravo Zhang
+caneGuy - Kang Zhou
+chaoslawful - Xiaozhe Wang
+cluo512 - Chuan Luo
+codeatri - Neha Patil
+crafty-coder - Carlos Pena
+debugger87 - Chaozhong Yang
+e-dorigatti - Emilio Dorigatti
+eric-maynard - Eric Maynard
+felixalbani - Felix Albani
+fjh100456 - fjh100456
--- End diff --

Jinhua Fu


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22928: [SPARK-25926][CORE] Move config entries in core module t...

2018-11-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22928
  
Keeping them in separate source files is also fine to me. I think we should 
put them in the same package. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22889: [SPARK-25882][SQL] Added a function to join two d...

2018-11-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22889#discussion_r230614726
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -883,6 +883,31 @@ class Dataset[T] private[sql](
 join(right, Seq(usingColumn))
   }
 
+  /**
+* Equi-join with another `DataFrame` using the given column.
+*
+* Different from other join functions, the join column will only 
appear once in the output,
+* i.e. similar to SQL's `JOIN USING` syntax.
+*
+* {{{
+*   // Left join of df1 and df2 using the column "user_id"
+*   df1.join(df2, "user_id", "left")
+* }}}
+*
+* @param right Right side of the join operation.
+* @param usingColumn Name of the column to join on. This column must 
exist on both sides.
+* @param joinType Type of join to perform. Default `inner`. Must be 
one of:
+* `inner`, `cross`, `outer`, `full`, `full_outer`, 
`left`, `left_outer`,
+* `right`, `right_outer`, `left_semi`, `left_anti`.
+* @note If you perform a self-join using this function without 
aliasing the input
+* `DataFrame`s, you will NOT be able to reference any columns after 
the join, since
+* there is no way to disambiguate which side of the join you would 
like to reference.
+* @group untypedrel
+*/
+  def join(right: Dataset[_], usingColumn: String, joinType: String): 
DataFrame = {
--- End diff --

Normally, we do not add such an API. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22913: [SPARK-25902][SQL] Add support for dates with millisecon...

2018-11-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22913
  
Also cc @ueshin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22693: [SPARK-25701][SQL] Supports calculation of table ...

2018-11-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22693#discussion_r230609824
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ---
@@ -115,26 +116,45 @@ class ResolveHiveSerdeTable(session: SparkSession) 
extends Rule[LogicalPlan] {
 
 class DetermineTableStats(session: SparkSession) extends Rule[LogicalPlan] 
{
   override def apply(plan: LogicalPlan): LogicalPlan = plan 
resolveOperators {
+case filterPlan @ Filter(_, SubqueryAlias(_, relation: 
HiveTableRelation)) =>
+  val predicates = 
PhysicalOperation.unapply(filterPlan).map(_._2).getOrElse(Nil)
+  computeTableStats(relation, predicates)
 case relation: HiveTableRelation
 if DDLUtils.isHiveTable(relation.tableMeta) && 
relation.tableMeta.stats.isEmpty =>
-  val table = relation.tableMeta
-  val sizeInBytes = if 
(session.sessionState.conf.fallBackToHdfsForStatsEnabled) {
-try {
-  val hadoopConf = session.sessionState.newHadoopConf()
-  val tablePath = new Path(table.location)
-  val fs: FileSystem = tablePath.getFileSystem(hadoopConf)
-  fs.getContentSummary(tablePath).getLength
-} catch {
-  case e: IOException =>
-logWarning("Failed to get table size from hdfs.", e)
-session.sessionState.conf.defaultSizeInBytes
-}
-  } else {
-session.sessionState.conf.defaultSizeInBytes
+  computeTableStats(relation)
+  }
+
+  private def computeTableStats(
+  relation: HiveTableRelation,
+  predicates: Seq[Expression] = Nil): LogicalPlan = {
+val table = relation.tableMeta
+val sizeInBytes = if 
(session.sessionState.conf.fallBackToHdfsForStatsEnabled) {
+  try {
+val hadoopConf = session.sessionState.newHadoopConf()
+val tablePath = new Path(table.location)
+val fs: FileSystem = tablePath.getFileSystem(hadoopConf)
+BigInt(fs.getContentSummary(tablePath).getLength)
+  } catch {
+case e: IOException =>
+  logWarning("Failed to get table size from hdfs.", e)
+  getSizeInBytesFromTablePartitions(table.identifier, predicates)
   }
+} else {
+  getSizeInBytesFromTablePartitions(table.identifier, predicates)
+}
+val withStats = table.copy(stats = Some(CatalogStatistics(sizeInBytes 
= sizeInBytes)))
+relation.copy(tableMeta = withStats)
+  }
 
-  val withStats = table.copy(stats = 
Some(CatalogStatistics(sizeInBytes = BigInt(sizeInBytes
-  relation.copy(tableMeta = withStats)
+  private def getSizeInBytesFromTablePartitions(
+  tableIdentifier: TableIdentifier,
+  predicates: Seq[Expression] = Nil): BigInt = {
+session.sessionState.catalog.listPartitionsByFilter(tableIdentifier, 
predicates) match {
--- End diff --

The perf will be pretty bad when the number of partitions is huge. Thus, I 
think we can close this PR. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19796: [SPARK-22581][SQL] Catalog api does not allow to ...

2018-11-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19796#discussion_r230609716
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala ---
@@ -411,7 +410,29 @@ abstract class Catalog {
   tableName: String,
   source: String,
   schema: StructType,
-  options: Map[String, String]): DataFrame
+  options: Map[String, String]): DataFrame = {
+createTable(tableName, source, schema, options, Nil)
+  }
+
+  /**
+* :: Experimental ::
+* (Scala-specific)
+* Create a table based on the dataset in a data source, a schema, a 
set of options and a set of partition columns.
+* Then, returns the corresponding DataFrame.
+*
+* @param tableName is either a qualified or unqualified name that 
designates a table.
+*  If no database identifier is provided, it refers to 
a table in
+*  the current database.
+* @since ???
+*/
+  @Experimental
+  @InterfaceStability.Evolving
+  def createTable(
+tableName: String,
+source: String,
+schema: StructType,
+options: Map[String, String],
+partitionColumnNames : Seq[String]): DataFrame
--- End diff --

I think we will not introduce a new API for partitioning columns in the 
current stage. Let us use SQL DDL for creating the table.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22089: [SPARK-25098][SQL]‘Cast’ will return NULL when input...

2018-11-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22089
  
@wangyum Could you please take it over?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22089: [SPARK-25098][SQL]‘Cast’ will return NULL whe...

2018-11-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22089#discussion_r230609486
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 ---
@@ -98,6 +98,7 @@ class CastSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 c.set(Calendar.MILLISECOND, 0)
 checkEvaluation(Cast(Literal("2015-03-18"), DateType), new 
Date(c.getTimeInMillis))
 checkEvaluation(Cast(Literal("2015-03-18 "), DateType), new 
Date(c.getTimeInMillis))
+checkEvaluation(Cast(Literal(" 2015-03-18"), DateType), new 
Date(c.getTimeInMillis))
--- End diff --

> SELECT CAST(' 22-OCT-1997' AS TIMESTAMP) FROM dual;

Oracle also trims the leading space. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceCommand does ...

2018-11-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22941
  
I think this is not a bug, although the plan is confusing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22930: [SPARK-24869][SQL] Fix SaveIntoDataSourceCommand'...

2018-11-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22930#discussion_r230609078
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SaveIntoDataSourceCommand.scala
 ---
@@ -37,13 +37,12 @@ case class SaveIntoDataSourceCommand(
 query: LogicalPlan,
 dataSource: CreatableRelationProvider,
 options: Map[String, String],
-mode: SaveMode) extends RunnableCommand {
+mode: SaveMode,
+outputColumnNames: Seq[String]) extends DataWritingCommand {
 
-  override protected def innerChildren: Seq[QueryPlan[_]] = Seq(query)
-
-  override def run(sparkSession: SparkSession): Seq[Row] = {
-dataSource.createRelation(
-  sparkSession.sqlContext, mode, options, Dataset.ofRows(sparkSession, 
query))
--- End diff --

See what I commented in https://github.com/apache/spark/pull/22941


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceComman...

2018-11-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22941#discussion_r230609046
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala ---
@@ -589,4 +590,33 @@ class InsertSuite extends DataSourceTest with 
SharedSQLContext {
   sql("INSERT INTO TABLE test_table SELECT 2, null")
 }
   }
+
+  test("SPARK-25936 InsertIntoDataSourceCommand does not use Cached Data") 
{
--- End diff --

You can move this test suite to CachedTableSuite.scala and use the helper 
functions to verify whether the cache is used. 

See the example. 
```
spark.range(2).createTempView("test_view")
spark.catalog.cacheTable("test_view")
val rddId = rddIdOf("test_view")
assert(!isMaterialized(rddId))
sql("INSERT INTO TABLE test_table SELECT * FROM test_view")
assert(isMaterialized(rddId))
```




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceComman...

2018-11-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22941#discussion_r230608937
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoDataSourceCommand.scala
 ---
@@ -30,14 +30,13 @@ import org.apache.spark.sql.sources.InsertableRelation
 case class InsertIntoDataSourceCommand(
 logicalRelation: LogicalRelation,
 query: LogicalPlan,
-overwrite: Boolean)
-  extends RunnableCommand {
+overwrite: Boolean,
+outputColumnNames: Seq[String])
+  extends DataWritingCommand {
 
-  override protected def innerChildren: Seq[QueryPlan[_]] = Seq(query)
-
-  override def run(sparkSession: SparkSession): Seq[Row] = {
+  override def run(sparkSession: SparkSession, child: SparkPlan): Seq[Row] 
= {
 val relation = 
logicalRelation.relation.asInstanceOf[InsertableRelation]
-val data = Dataset.ofRows(sparkSession, query)
--- End diff --

This will use the cached data, although the plan does not show the cached 
data is used. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22936: [SPARK-19799] Support WITH clause (CTE) in subqueries

2018-11-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22936
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...

2018-11-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22932
  
Will take a look this week. Thanks for your work!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

2018-11-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22932#discussion_r230563752
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/describe-part-after-analyze.sql.out
 ---
@@ -93,7 +93,7 @@ Partition Values  [ds=2017-08-01, hr=10]
 Location [not included in 
comparison]sql/core/spark-warehouse/t/ds=2017-08-01/hr=10   
 
 Created Time [not included in comparison]
 Last Access [not included in comparison]
-Partition Statistics   1121 bytes, 3 rows  
+Partition Statistics   1229 bytes, 3 rows  
--- End diff --

This is caused by adding `org.apache.spark.sql.create.version = 
3.0.0-SNAPSHOT`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-11-03 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22255
  
Just to confirm it. `created_by` is set to `parquet-mr version 1.10.0 
(build 031a6654009e3b82020012a18434c582bd74c73a)`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...

2018-11-02 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22921
  
LGTM except a minor comment about the test case. Also we need to fix the 
PySpark test failure 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22921: [SPARK-25908][CORE][SQL] Remove old deprecated it...

2018-11-02 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22921#discussion_r230283948
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -279,27 +254,6 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
   structDf.select(hash($"a", $"record.*")))
   }
 
-  test("Star Expansion - explode should fail with a meaningful message if 
it takes a star") {
--- End diff --

Can we rewrite this test case using select(explode()), like what we did in 
the following test cases?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-11-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22255
  
Also cc @hvanhovell 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-11-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22255
  
@dongjoon-hyun Do you want to take this over?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-11-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22255
  

https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L902

@rdblue  Can we use created_by?

```
  /** String for application that wrote this file.  This should be in the 
format
   *  version  (build ).
   * e.g. impala version 1.0 (build 
6cf94d29b2b7115df4de2c06e2ab4326d721eb55)
   **/
  6: optional string created_by
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15599: [SPARK-18022][SQL] java.lang.NullPointerException instea...

2018-11-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15599
  
@shatestest Your problem is different from the issue this PR tries to 
resolve. If you can provide a test case to reproduce it, feel free to open a 
JIRA


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should ...

2018-11-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22927#discussion_r230217147
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -393,7 +393,7 @@ object LoadDataCommand {
   throw new IllegalArgumentException(e)
   }
 } else {
-  path
+  new Path(pathUri)
--- End diff --

Here, it is converted PATH to URI and then converted back to Path. What is 
your goal rather than directly building a path?
```
if (path.isAbsolute()) path else new Path(workingDir, path)
 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should ...

2018-11-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22927#discussion_r230216125
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -1987,6 +1987,13 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
 }
   }
 
+  test("SPARK-25918: LOAD DATA LOCAL INPATH should handle a relative 
path") {
--- End diff --

This does not belong to this test suite. `HiveCommandSuite.scala` is the 
best place, although this is not for hive module. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22924: [SPARK-25891][PYTHON] Upgrade to Py4J 0.10.8.1

2018-11-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22924
  
I would be more conservative in these cases. The new release of Py4J might 
introduce new regressions. Not all the changes are clearly documented in their 
release notes. It is kind of aggressive to change the dependence in our 
maintenance releases. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type which r...

2018-11-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22905
  
@mallman Could you run the EXPLAIN with this new changes and post it in the 
PR description?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-11-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21688
  
cc @gengliangwang 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22924: [SPARK-25891][PYTHON] Upgrade to Py4J 0.10.8.1

2018-11-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22924
  
I am against this change in 2.4 branch unless the bugs are critical for our 
end users. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22788: [SPARK-25769][SQL]make UnresolvedAttribute.sql escape ne...

2018-11-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22788
  
This sounds a safe change. cc @liancheng 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22910: [SPARK-25899][TESTS]Fix flaky CoarseGrainedSchedulerBack...

2018-10-31 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22910
  
Thanks! Merged to master/2.4


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-10-31 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22847#discussion_r229879855
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -812,6 +812,17 @@ object SQLConf {
 .intConf
 .createWithDefault(65535)
 
+  val CODEGEN_METHOD_SPLIT_THRESHOLD = 
buildConf("spark.sql.codegen.methodSplitThreshold")
+.internal()
+.doc("The threshold of source code length without comment of a single 
Java function by " +
+  "codegen to be split. When the generated Java function source code 
exceeds this threshold" +
+  ", it will be split into multiple small functions. We can't know how 
many bytecode will " +
+  "be generated, so use the code length as metric. A function's 
bytecode should not go " +
+  "beyond 8KB, otherwise it will not be JITted; it also should not be 
too small, otherwise " +
+  "there will be many function calls.")
+.intConf
--- End diff --

Could you try some very long alias names or complex expressions? You will 
get different number, right? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

2018-10-31 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22778
  
Let us hold this PR and try to fix 
https://github.com/apache/spark/pull/17520 instead.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22890: [SPARK-25883][SQL][Minor] Override method `prettyName` i...

2018-10-31 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22890
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-10-30 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22847#discussion_r229572426
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -812,6 +812,18 @@ object SQLConf {
 .intConf
 .createWithDefault(65535)
 
+  val CODEGEN_METHOD_SPLIT_THRESHOLD = 
buildConf("spark.sql.codegen.methodSplitThreshold")
+.internal()
+.doc("The threshold of source code length without comment of a single 
Java function by " +
+  "codegen to be split. When the generated Java function source code 
exceeds this threshold" +
+  ", it will be split into multiple small functions. We cannot know 
how many bytecode will " +
--- End diff --

`The threshold of source-code splitting in the codegen. When the number of 
characters in a single JAVA function (without comment) exceeds the threshold, 
the function will be automatically split to multiple smaller ones.`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22870: [SPARK-25862][SQL] Remove rangeBetween APIs introduced i...

2018-10-30 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22870
  
LGTM Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17520: [WIP][SPARK-19712][SQL] Move PullupCorrelatedPredicates ...

2018-10-30 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17520
  
@dilipbiswal Could you take this over?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...

2018-10-30 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22778#discussion_r229361802
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -171,10 +171,13 @@ abstract class Optimizer(sessionCatalog: 
SessionCatalog)
 // "Extract PythonUDF From JoinCondition".
 Batch("Check Cartesian Products", Once,
   CheckCartesianProducts) :+
-Batch("RewriteSubquery", Once,
+Batch("Rewrite Subquery", Once,
--- End diff --

Sure. That sounds also good to me. @dilipbiswal Could you take the PR 
https://github.com/apache/spark/pull/17520 over?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...

2018-10-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22778#discussion_r229178084
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -171,10 +171,13 @@ abstract class Optimizer(sessionCatalog: 
SessionCatalog)
 // "Extract PythonUDF From JoinCondition".
 Batch("Check Cartesian Products", Once,
   CheckCartesianProducts) :+
-Batch("RewriteSubquery", Once,
+Batch("Rewrite Subquery", Once,
--- End diff --

I do not have a good answer for this PR. Ideally, we should run the whole 
batch `operatorOptimizationBatch`. However, running the rules could be very 
time consuming. I would suggest to add a new parameter for introducing the time 
bound limit for each batch.

cc @maryannxue  WDYT?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22879: [SPARK-25872][SQL][TEST] Add an optimizer tracker for TP...

2018-10-29 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22879
  
For generating the golden files, I would not generate the whole plan. 
Instead, we can keep the node name. That should be good enough. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22857: [SPARK-25860][SQL] Replace Literal(null, _) with FalseLi...

2018-10-29 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22857
  
Please be really careful in null handling. It could easily introduce the 
correctness bugs like what we recently fixed. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22878: [SPARK-25789][SQL] Support for Dataset of Avro

2018-10-29 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22878
  
cc @gengliangwang 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22870: [SPARK-25862][SQL] Remove rangeBetween APIs intro...

2018-10-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22870#discussion_r228776577
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
 ---
@@ -206,7 +206,7 @@ case class SpecifiedWindowFrame(
 // Check combination (of expressions).
 (lower, upper) match {
   case (l: Expression, u: Expression) if !isValidFrameBoundary(l, u) =>
-TypeCheckFailure(s"Window frame upper bound '$upper' does not 
followes the lower bound " +
+TypeCheckFailure(s"Window frame upper bound '$upper' does not 
follow the lower bound " +
--- End diff --

We need to rerun org.apache.spark.sql.SQLQueryTestSuite


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...

2018-10-28 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22514
  
This requires more careful review. I do not think we can make it in RC5


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22817: [SPARK-25816][SQL] Fix attribute resolution in nested ex...

2018-10-28 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22817
  
LGTM

Thanks! Merged to master/2.4/2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22817: [SPARK-25816][SQL] Fix attribute resolution in nested ex...

2018-10-27 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22817
  
The fix looks fine to me. cc @cloud-fan @hvanhovell 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22817: [SPARK-25816][SQL] Fix attribute resolution in ne...

2018-10-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22817#discussion_r228729920
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -2578,4 +2578,12 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
 Row ("abc", 1))
 }
   }
+
+  test("SPARK-25816 ResolveReferences works with nested extractors") {
+val df0 = Seq((1, Map(1 -> "a")), (2, Map(2 -> "b"))).toDF("1", "2")
+val df1 = df0.select($"1".as("2"), $"2".as("1"))
+val df2 = df1.filter($"1"(map_keys($"1")(0)) > "a")
--- End diff --

We are unable to resolve the expressions in `extraction` of 
`UnresolvedExtractValue`. We can simplify the expression in the `extraction`. 
For example, `df1.filter($"1"($"2") > "a")`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22817: [SPARK-25816][SQL] Fix attribute resolution in nested ex...

2018-10-26 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22817
  
Will take a look at this tonight. Thanks for reporting this!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22827: [SPARK-25832][SQL][BRANCH-2.4] Revert newly added...

2018-10-26 Thread gatorsmile
Github user gatorsmile closed the pull request at:

https://github.com/apache/spark/pull/22827


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22827: [SPARK-25832][SQL] Revert newly added map related...

2018-10-25 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22827#discussion_r228267525
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/higher-order-functions.sql ---
@@ -60,8 +60,3 @@ select zip_with(array('a', 'b', 'c'), array('d', 'e', 
'f'), (x, y) -> concat(x,
 
 -- Zip with array coalesce
 select zip_with(array('a'), array('d', null, 'f'), (x, y) -> coalesce(x, 
y)) as v;
-
-create or replace temporary view nested as values
-  (1, map(1, 1, 2, 2, 3, 3)),
-  (2, map(4, 4, 5, 5, 6, 6))
-  as t(x, ys);
--- End diff --

That has been removed already


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22827: [SPARK-25832][SQL] Revert newly added map related functi...

2018-10-25 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22827
  
@dongjoon-hyun I think that can be kept. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22827: [SPARK-25832][SQL] Revert newly added map related functi...

2018-10-25 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22827
  
@felixcheung @mengxr The last commit 
https://github.com/apache/spark/pull/22827/commits/f1f3d0b09527d28fd8a1f1a2c8e2358c10da175b
 is for R related revert. Please take a look 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22827: [SPARK-25832][SQL] Revert newly added map related functi...

2018-10-25 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22827
  
@cloud-fan @dongjoon-hyun @rxin @felixcheung 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22827: [SPARK-25832][SQL] Revert newly added map related...

2018-10-25 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/22827

[SPARK-25832][SQL] Revert newly added map related functions

## What changes were proposed in this pull request?

- Revert [SPARK-23935][SQL] Adding map_entries function: 
https://github.com/apache/spark/pull/21236
- Revert [SPARK-23937][SQL] Add map_filter SQL function: 
https://github.com/apache/spark/pull/21986
- Revert [SPARK-23940][SQL] Add transform_values SQL function: 
https://github.com/apache/spark/pull/22045
- Revert [SPARK-23939][SQL] Add transform_keys function: 
https://github.com/apache/spark/pull/22013
- Revert [SPARK-23932][SQL] Higher order function zip_with: 
https://github.com/apache/spark/pull/22031
- Revert [SPARK-23938][SQL] Add map_zip_with function: 
https://github.com/apache/spark/pull/22017
- Revert the changes of map_entries in [SPARK-24331][SPARKR][SQL] Adding 
arrays_overlap, array_repeat, map_entries to SparkR: 
https://github.com/apache/spark/pull/21434/

## How was this patch tested?
The existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark revertMap2.4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22827.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22827


commit b8cab59abc45d9178785829f6a455e8a7a6a61fa
Author: gatorsmile 
Date:   2018-10-25T16:05:20Z

Revert [SPARK-23935][SQL] Adding map_entries function

commit 25498322863ccad2a1481c939dcd0392862baf55
Author: gatorsmile 
Date:   2018-10-25T16:18:40Z

Revert [SPARK-23937][SQL] Add map_filter SQL function

commit 8623f864366ba2e7a38150ee4aa119e90b050155
Author: gatorsmile 
Date:   2018-10-25T16:23:17Z

Revert [SPARK-23940][SQL] Add transform_values SQL function

commit 666d6ab25f6d12f27510740badbf7501fd08168a
Author: gatorsmile 
Date:   2018-10-25T16:27:03Z

Revert [SPARK-23939][SQL] Add transform_keys function

commit 59393872427866d032eefe2e04e7ac6d94e52219
Author: gatorsmile 
Date:   2018-10-25T16:34:19Z

Revert [SPARK-23938][SQL] Add map_zip_with function

commit f1f3d0b09527d28fd8a1f1a2c8e2358c10da175b
Author: gatorsmile 
Date:   2018-10-25T16:40:55Z

Revert the changes of map_entries in [SPARK-24331][SPARKR][SQL] Adding 
arrays_overlap, array_repeat, map_entries to SparkR




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22821: [SPARK-25832][SQL] remove newly added map related functi...

2018-10-25 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22821
  
I will submit a PR to revert all these changes. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22773: [SPARK-25785][SQL] Add prettyNames for from_json, to_jso...

2018-10-24 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22773
  
Yes. The goal of migration guide is for helping end users upgrade their 
Spark.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22773: [SPARK-25785][SQL] Add prettyNames for from_json, to_jso...

2018-10-24 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22773
  
All the things that could break the existing user applications should be 
documented in the migration guide. This will simplify the system upgrade of our 
end users. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22773: [SPARK-25785][SQL] Add prettyNames for from_json, to_jso...

2018-10-24 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22773
  
This is an external change. This needs a migration guide update. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22817: [SPARK-25816][SQL] ResolveReferences should work bottom-...

2018-10-24 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22817
  
This would not work, since we need both bottom-up and top-down. You would 
hit the test failure.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22794: [SPARK-24570][SQL] Implement Spark own GetTablesO...

2018-10-24 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22794#discussion_r227872504
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetTablesOperation.scala
 ---
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.thriftserver
+
+import java.util.{List => JList, Map => JMap, UUID}
+
+import scala.collection.JavaConverters.seqAsJavaListConverter
+
+import 
org.apache.hadoop.hive.ql.security.authorization.plugin.HiveOperationType
+import 
org.apache.hadoop.hive.ql.security.authorization.plugin.HivePrivilegeObjectUtils
+import org.apache.hive.service.cli._
+import org.apache.hive.service.cli.operation.GetTablesOperation
+import org.apache.hive.service.cli.session.HiveSession
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.SQLContext
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType._
+import org.apache.spark.sql.catalyst.catalog.SessionCatalog
+
+private[hive] class SparkGetTablesOperation(
+parentSession: HiveSession,
+catalogName: String,
+schemaName: String,
+tableName: String,
+tableTypes: JList[String])
+(sqlContext: SQLContext, sessionToActivePool: JMap[SessionHandle, 
String])
--- End diff --

Just add sqlContext in the class parameter.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22794: [SPARK-24570][SQL] Implement Spark own GetTablesO...

2018-10-24 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22794#discussion_r227872346
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetTablesOperation.scala
 ---
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.thriftserver
+
+import java.util.{List => JList, Map => JMap, UUID}
+
+import scala.collection.JavaConverters.seqAsJavaListConverter
+
+import 
org.apache.hadoop.hive.ql.security.authorization.plugin.HiveOperationType
+import 
org.apache.hadoop.hive.ql.security.authorization.plugin.HivePrivilegeObjectUtils
+import org.apache.hive.service.cli._
+import org.apache.hive.service.cli.operation.GetTablesOperation
+import org.apache.hive.service.cli.session.HiveSession
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.SQLContext
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType._
+import org.apache.spark.sql.catalyst.catalog.SessionCatalog
+
+private[hive] class SparkGetTablesOperation(
+parentSession: HiveSession,
+catalogName: String,
+schemaName: String,
+tableName: String,
+tableTypes: JList[String])
+(sqlContext: SQLContext, sessionToActivePool: JMap[SessionHandle, 
String])
--- End diff --

Just wondering why we need `sessionToActivePool`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22794: [SPARK-24570][SQL] Implement Spark own GetTablesO...

2018-10-24 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22794#discussion_r227869218
  
--- Diff: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
 ---
@@ -644,6 +644,47 @@ class HiveThriftBinaryServerSuite extends 
HiveThriftJdbcTest {
   assert(resultSet.getString(1) === "4.56")
 }
   }
+
+  test("SPARK-24196: SQL client(DBeaver) can't show tables") {
--- End diff --

Create a `SparkMetadataOperationSuite`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22794: [SPARK-24570][SQL] Implement Spark own GetTablesOperatio...

2018-10-24 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22794
  
cc @srinathshankar 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22144
  
I think this is different from the blocker tickets we opened before. We 
should try our best to avoid accidentally dropping the existing support. Please 
encourage more people in the community to try our RCs and find out all the new 
regressions or bugs. 

During the RC stage, for the new regressions, we can either fix it if the 
fix is very safe/tiny; or revert the PRs that introduce the regressions. This 
is how we handle the regressions during the RC stage. 

For this specific case, I do not think the root cause is found. If we 
revert the previous PRs https://issues.apache.org/jira/browse/SPARK-18186 that 
were merged in 2.2 release, it could easily introduce new regressions. Thus, we 
normally do not revert the PRs that were merged in the previous releases. 
Please add it as a known issue in the release note. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22772: [SPARK-24499][SQL][DOC][Followup] Fix some broken links

2018-10-19 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22772
  
LGTM

Let us merge this PR first. If we find more, feel free to open a new 
follow-up PR


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22750: [SPARK-25747][SQL] remove ColumnarBatchScan.needsUnsafeR...

2018-10-19 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22750
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22750: [SPARK-25747][SQL] remove ColumnarBatchScan.needs...

2018-10-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22750#discussion_r226818662
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala 
---
@@ -168,10 +168,11 @@ case class FileSourceScanExec(
 
   // Note that some vals referring the file-based relation are lazy 
intentionally
   // so that this plan can be canonicalized on executor side too. See 
SPARK-23731.
-  override lazy val supportsBatch: Boolean = 
relation.fileFormat.supportBatch(
-relation.sparkSession, StructType.fromAttributes(output))
+  override lazy val supportsBatch: Boolean = {
+relation.fileFormat.supportBatch(relation.sparkSession, schema)
+  }
 
-  override lazy val needsUnsafeRowConversion: Boolean = {
+  private lazy val needsUnsafeRowConversion: Boolean = {
 if (relation.fileFormat.isInstanceOf[ParquetSource]) {
--- End diff --

How about ORC? Why we only need to check the Parquet?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22750: [SPARK-25747][SQL] remove ColumnarBatchScan.needs...

2018-10-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22750#discussion_r226806646
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarBatchScan.scala 
---
@@ -164,12 +162,11 @@ private[sql] trait ColumnarBatchScan extends 
CodegenSupport {
 val outputVars = output.zipWithIndex.map { case (a, i) =>
--- End diff --

set outputVars to null?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

2018-10-19 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22756
  
Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

2018-10-19 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22756
  
Let me revert it. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22756: [SPARK-25758][ML] Deprecate computeCost on BisectingKMea...

2018-10-19 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22756
  
cc @mengxr WDYT? It does not sound a blocker to me. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...

2018-10-19 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22732
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...

2018-10-19 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22732
  
LGTM pending Jenkins


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22732: [SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor sig...

2018-10-18 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22732
  
For Scala 2.11, we should not introduce any behavior change and also keep 
binary and source compatibility. 

```
scala> import org.apache.spark.sql.types.DataTypes
import org.apache.spark.sql.types.DataTypes

scala> val f2 = udf({(x: Int) => x}, DataTypes.IntegerType)
f2: org.apache.spark.sql.expressions.UserDefinedFunction = 
UserDefinedFunction(,IntegerType,None)

scala> spark.range(3).select(f2('id + null)).show()
++
|UDF((id + null))|
++
|null|
|null|
|null|
++
```

For Scala 2.12, since we are unable to know the type nullability in a few 
APIs, we issue a warning message in these cases. Below is the example which 
will generate a different answer:

```
scala> import org.apache.spark.sql.types.DataTypes
import org.apache.spark.sql.types.DataTypes

scala> val f2 = udf({(x: Int) => x}, DataTypes.IntegerType)
f2: org.apache.spark.sql.expressions.UserDefinedFunction = 
UserDefinedFunction($Lambda$2801/26868055@5eb35a26,IntegerType,None)

scala> spark.range(3).select(f2('id + null)).show()
18/10/18 23:07:02 WARN ScalaReflection: Scala version 2.12.7 cannot get 
type nullability correctly via reflection, thus Spark cannot add proper input 
null check for UDF. To avoid this problem, use the typed UDF interfaces instead.
++
|UDF((id + null))|
++
|   0|
|   0|
|   0|
++
```




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22769: [SPARK-24499][DOC][FOLLOW-UP] Split the page of s...

2018-10-18 Thread gatorsmile
Github user gatorsmile closed the pull request at:

https://github.com/apache/spark/pull/22769


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22769: [SPARK-24499][DOC][FOLLOW-UP] Split the page of s...

2018-10-18 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/22769

[SPARK-24499][DOC][FOLLOW-UP] Split the page of sql-programming-guide.html 
to multiple separate pages 

## What changes were proposed in this pull request?
Forgot to clean remove the link for `Upgrading From Spark SQL 2.4 to 3.0` 
when merging to 2.4

## How was this patch tested?
N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark test2.4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22769.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22769


commit 4027ba1097caa053a7d30926b0e407912bfd78f6
Author: gatorsmile 
Date:   2018-10-18T20:14:17Z

fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22769: [SPARK-24499][DOC][FOLLOW-UP] Split the page of sql-prog...

2018-10-18 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22769
  
cc @dongjoon-hyun @xuanyuanking 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   10   >