[GitHub] spark issue #15389: [SPARK-17817][PySpark] PySpark RDD Repartitioning Result...

2016-10-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15389 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15389: [SPARK-17817][PySpark] PySpark RDD Repartitioning Result...

2016-10-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15389 Seems Jenkins are not in working status? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15389: [SPARK-17817][PySpark] PySpark RDD Repartitioning Result...

2016-10-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15389 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15388: [SPARK-17821][SQL] Support And and Or in Expression Cano...

2016-10-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15388 ping @rxin @hvanhovell @cloud-fan @gatorsmile any else need to address? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-10-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14452 @rxin yeah, as I tried adding explicit cache call doesn't improve it. So I remove it then. --- If your project is set up for it, you can reply to this email and have your reply appear on GitH

[GitHub] spark pull request #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflic...

2016-10-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15423#discussion_r82716067 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -207,6 +208,7 @@ class SQLQueryTestSuite extends QueryTest with

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82716769 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,343 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflic...

2016-10-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15423#discussion_r82718263 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -207,6 +208,7 @@ class SQLQueryTestSuite extends QueryTest with

[GitHub] spark pull request #15388: [SPARK-17821][SQL] Support And and Or in Expressi...

2016-10-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15388#discussion_r82718343 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala --- @@ -80,6 +80,65 @@ class ExpressionSetSuite

[GitHub] spark pull request #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflic...

2016-10-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15423#discussion_r82720911 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -207,6 +208,7 @@ class SQLQueryTestSuite extends QueryTest with

[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2016-10-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15398#discussion_r82722525 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala --- @@ -25,26 +25,25 @@ object StringUtils

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82725489 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,343 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropd...

2016-10-10 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/15427 [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicates ## What changes were proposed in this pull request? Two issues regarding Dataset.dropduplicates: 1

[GitHub] spark issue #14847: [SPARK-17254][SQL] Filter can stop when the condition is...

2016-10-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14847 Re-open it and see if we can have some consensus about this direction. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #14847: [SPARK-17254][SQL] Filter can stop when the condi...

2016-10-11 Thread viirya
GitHub user viirya reopened a pull request: https://github.com/apache/spark/pull/14847 [SPARK-17254][SQL] Filter can stop when the condition is false if the child output is sorted ## What changes were proposed in this pull request? From https://issues.apache.org/jira

[GitHub] spark pull request #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflic...

2016-10-11 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15423#discussion_r82745403 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -168,17 +168,7 @@ class SparkSqlAstBuilder(conf: SQLConf

[GitHub] spark issue #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicat...

2016-10-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15427 cc @cloud-fan @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #15388: [SPARK-17821][SQL] Support And and Or in Expression Cano...

2016-10-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15388 Thanks! @rxin @cloud-fan @hvanhovell @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflic...

2016-10-11 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15423#discussion_r82750409 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -168,17 +168,7 @@ class SparkSqlAstBuilder(conf: SQLConf

[GitHub] spark pull request #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflic...

2016-10-11 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15423#discussion_r82750631 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -168,17 +168,7 @@ class SparkSqlAstBuilder(conf: SQLConf

[GitHub] spark issue #14847: [SPARK-17254][SQL] Filter can stop when the condition is...

2016-10-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14847 @rxin Thanks for recommendation! Let me close it now and work on it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #14847: [SPARK-17254][SQL] Filter can stop when the condi...

2016-10-11 Thread viirya
Github user viirya closed the pull request at: https://github.com/apache/spark/pull/14847 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #15389: [SPARK-17817][PySpark] PySpark RDD Repartitioning...

2016-10-11 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15389#discussion_r82929167 --- Diff: python/pyspark/rdd.py --- @@ -2029,7 +2028,15 @@ def coalesce(self, numPartitions, shuffle=False): >>> sc.parallelize([1, 2

[GitHub] spark pull request #15445: [SPARK-17817][PySpark][FOLLOWUP] PySpark RDD Repa...

2016-10-11 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/15445 [SPARK-17817][PySpark][FOLLOWUP] PySpark RDD Repartitioning Results in Highly Skewed Partition Sizes ## What changes were proposed in this pull request? This change is a followup for

[GitHub] spark pull request #15389: [SPARK-17817][PySpark] PySpark RDD Repartitioning...

2016-10-11 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15389#discussion_r82930615 --- Diff: python/pyspark/rdd.py --- @@ -2029,7 +2028,15 @@ def coalesce(self, numPartitions, shuffle=False): >>> sc.parallelize([1, 2

[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2016-10-11 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15398#discussion_r82931395 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala --- @@ -25,26 +25,25 @@ object StringUtils

[GitHub] spark pull request #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflic...

2016-10-11 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15423#discussion_r82937473 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -207,6 +208,7 @@ class SQLQueryTestSuite extends QueryTest with

[GitHub] spark issue #15445: [SPARK-17817][PySpark][FOLLOWUP] PySpark RDD Repartition...

2016-10-12 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15445 @felixcheung I post the benchmark in #15389. Now post here too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropd...

2016-10-12 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15427#discussion_r83140093 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1878,17 +1878,25 @@ class Dataset[T] private[sql]( def dropDuplicates

[GitHub] spark issue #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicat...

2016-10-12 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15427 Thanks for review! @rxin @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-12 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r83146607 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala --- @@ -0,0 +1,343 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #15457: [SPARK-17830][SQL] Annotate remaining SQL APIs wi...

2016-10-13 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15457#discussion_r83162239 --- Diff: sql/core/src/main/java/org/apache/spark/sql/api/java/UDF1.java --- @@ -19,14 +19,12 @@ import java.io.Serializable

[GitHub] spark issue #15445: [SPARK-17817][PySpark][FOLLOWUP] PySpark RDD Repartition...

2016-10-13 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15445 @davies @felixcheung I ran another benchmark as follows: import time import random num_partitions = 2 a = sc.parallelize(map(lambda x: [random.randint

[GitHub] spark pull request #12335: [SPARK-11321] [SQL] Python non null udfs

2016-10-13 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/12335#discussion_r83357331 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonUDF.scala --- @@ -28,10 +28,11 @@ case class PythonUDF( name: String

[GitHub] spark pull request #12335: [SPARK-11321] [SQL] Python non null udfs

2016-10-13 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/12335#discussion_r83357461 --- Diff: python/pyspark/sql/functions.py --- @@ -1741,15 +1742,15 @@ def __call__(self, *cols): @since(1.3) -def udf(f, returnType

[GitHub] spark pull request #14847: [SPARK-17254][SQL] Filter can stop when the condi...

2016-10-13 Thread viirya
GitHub user viirya reopened a pull request: https://github.com/apache/spark/pull/14847 [SPARK-17254][SQL] Filter can stop when the condition is false if the child output is sorted ## What changes were proposed in this pull request? From https://issues.apache.org/jira

[GitHub] spark issue #15445: [SPARK-17817][PySpark][FOLLOWUP] PySpark RDD Repartition...

2016-10-14 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15445 ping @davies @felixcheung Could you take a look to see if we want to apply this? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request #15495: [SPARK-17620][SQL] Determine Serde by hive.defaul...

2016-10-14 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15495#discussion_r83526181 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -587,6 +594,30 @@ class SQLQuerySuite extends QueryTest

[GitHub] spark pull request #14847: [SPARK-17254][SQL] Add StopAfter physical plan fo...

2016-10-15 Thread viirya
Github user viirya closed the pull request at: https://github.com/apache/spark/pull/14847 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #15495: [SPARK-17620][SQL] Determine Serde by hive.defaul...

2016-10-15 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15495#discussion_r83528964 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -587,6 +594,30 @@ class SQLQuerySuite extends QueryTest

[GitHub] spark pull request #15500: [SPARK-17956][SQL] Fix projection output ordering

2016-10-15 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/15500 [SPARK-17956][SQL] Fix projection output ordering ## What changes were proposed in this pull request? Currently `ProjectExec` simply takes child plan's `outputOrdering` a

[GitHub] spark pull request #15398: [SPARK-17647][SQL][WIP] Fix backslash escaping in...

2016-10-16 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15398#discussion_r83552457 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala --- @@ -25,26 +25,25 @@ object StringUtils

[GitHub] spark issue #15398: [SPARK-17647][SQL][WIP] Fix backslash escaping in 'LIKE'...

2016-10-16 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15398 For escaping before a non-special character, I don't know if DB2 is special. Because as I try, MySQL behaving like PostgreSQL. --- If your project is set up for it, you can reply to this emai

[GitHub] spark pull request #15495: [SPARK-17620][SQL] Determine Serde by hive.defaul...

2016-10-16 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15495#discussion_r83552790 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -587,6 +594,30 @@ class SQLQuerySuite extends QueryTest

[GitHub] spark issue #15398: [SPARK-17647][SQL][WIP] Fix backslash escaping in 'LIKE'...

2016-10-16 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15398 > If the character after an escape character is not a wildcard character, the escape character is discarded and the character following the escape is treated as a regular character in the patt

[GitHub] spark issue #15398: [SPARK-17647][SQL][WIP] Fix backslash escaping in 'LIKE'...

2016-10-16 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15398 @gatorsmile That is for ending a pattern with the escape sequence. I mean escaping before a non-special character. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #15398: [SPARK-17647][SQL][WIP] Fix backslash escaping in 'LIKE'...

2016-10-16 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15398 Maybe more important is, how Hive performs `like`. For escaping before a non-special character, loos like it is different to above examples. If you gives pattern like `\a`, it matches exactly `\a

[GitHub] spark pull request #15500: [SPARK-17956][SQL] Fix projection output ordering

2016-10-16 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15500#discussion_r83582706 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -77,9 +77,40 @@ case class ProjectExec(projectList

[GitHub] spark pull request #14847: [SPARK-17254][SQL] Add StopAfter physical plan fo...

2016-10-17 Thread viirya
GitHub user viirya reopened a pull request: https://github.com/apache/spark/pull/14847 [SPARK-17254][SQL] Add StopAfter physical plan for the filtering that can be stopped early ## What changes were proposed in this pull request? This is motivated by: From https

[GitHub] spark issue #14847: [SPARK-17254][SQL] Add StopAfter physical plan for the f...

2016-10-17 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14847 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15445: [SPARK-17817][PySpark][FOLLOWUP] PySpark RDD Repartition...

2016-10-17 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15445 ping @davies @felixcheung May you review this again? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader

2016-10-17 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13775 I agree that from a maintenance standpoint forking the classes is bad. But if we really want to have the one in Spark, I would like to help too. :) --- If your project is set up for it, you can

[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader

2016-10-17 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13775 @tejasapatil Thanks for the review comment! I will update this later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflic...

2016-10-17 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15423#discussion_r83771158 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -168,17 +168,7 @@ class SparkSqlAstBuilder(conf: SQLConf

[GitHub] spark issue #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflict check...

2016-10-17 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15423 LGTM, see if @cloud-fan has more comments on this or not? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #15500: [SPARK-17956][SQL] Fix projection output ordering

2016-10-17 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15500 also cc @cloud-fan @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #15500: [SPARK-17956][SQL] Fix projection output ordering

2016-10-17 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15500#discussion_r83781352 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -77,9 +77,40 @@ case class ProjectExec(projectList

[GitHub] spark pull request #15500: [SPARK-17956][SQL] Fix projection output ordering

2016-10-17 Thread viirya
Github user viirya closed the pull request at: https://github.com/apache/spark/pull/15500 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #13775: [SPARK-16060][SQL] Vectorized Orc reader

2016-10-18 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13775#discussion_r83796409 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala --- @@ -118,6 +120,11 @@ class OrcFileFormat extends FileFormat with

[GitHub] spark pull request #13775: [SPARK-16060][SQL] Vectorized Orc reader

2016-10-18 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13775#discussion_r83796484 --- Diff: sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/VectorizedSparkOrcNewRecordReader.java --- @@ -0,0 +1,318 @@ +/* + * Licensed to

[GitHub] spark issue #15445: [SPARK-17817][PySpark][FOLLOWUP] PySpark RDD Repartition...

2016-10-18 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15445 @davies @felixcheung Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrai...

2016-10-18 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15481#discussion_r83990921 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -393,7 +393,7 @@ class

[GitHub] spark pull request #15547: [SPARK-18002][SQL] Pruning unnecessary IsNotNull ...

2016-10-18 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/15547 [SPARK-18002][SQL] Pruning unnecessary IsNotNull predicates from Filter ## What changes were proposed in this pull request? In `PruneFilters` rule, we can prune unnecessary `IsNotNull

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-18 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r83998540 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala --- @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-18 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r83998771 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala --- @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-18 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r83999452 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala --- @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-18 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r83999827 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala --- @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-18 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r8416 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala --- @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-19 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15481 @mridulm I checked #9963 and looks like we don't test against `CoarseGrainedSchedulerBackend.reset`. --- If your project is set up for it, you can reply to this email and have your reply appe

[GitHub] spark pull request #15523: [SPARK-17981] [SPARK-17957] [SQL] Fix Incorrect N...

2016-10-19 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15523#discussion_r84010085 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -87,7 +87,14 @@ case class FilterExec(condition

[GitHub] spark pull request #15523: [SPARK-17981] [SPARK-17957] [SQL] Fix Incorrect N...

2016-10-19 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15523#discussion_r84011498 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -87,7 +87,14 @@ case class FilterExec(condition

[GitHub] spark issue #15547: [SPARK-18002][SQL] Pruning unnecessary IsNotNull predica...

2016-10-19 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15547 cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark issue #15547: [SPARK-18002][SQL] Pruning unnecessary IsNotNull predica...

2016-10-19 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15547 @cloud-fan yeah, I've not noticed `NullPropagation` already has rule for this. Close this now. --- If your project is set up for it, you can reply to this email and have your reply appear on G

[GitHub] spark pull request #15547: [SPARK-18002][SQL] Pruning unnecessary IsNotNull ...

2016-10-19 Thread viirya
Github user viirya closed the pull request at: https://github.com/apache/spark/pull/15547 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrai...

2016-10-19 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15481#discussion_r84024475 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -145,6 +145,9 @@ class

[GitHub] spark issue #15523: [SPARK-17981] [SPARK-17957] [SQL] Fix Incorrect Nullabil...

2016-10-19 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15523 @gatorsmile A predicate like `IsNotNull(a + b + Rand())` will let this change to wrongly set the nullability of `a` and `b` to true. Isn't it? --- If your project is set up for it, you can rep

[GitHub] spark issue #14847: [SPARK-17254][SQL] Add StopAfter physical plan for the f...

2016-10-19 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14847 @ioana-delaney Thanks for review! I replied few points first. I will add the tests you mentioned later. 4. This feature is motivated from the bucketed (and sorted, of course) table

[GitHub] spark pull request #15558: [SPARK-17357][SPARK-6624][SQL] Convert filter pre...

2016-10-19 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/15558 [SPARK-17357][SPARK-6624][SQL] Convert filter predicate to CNF in Optimizer for pushdown ## What changes were proposed in this pull request? This PR is proposed to solve the problem #14912

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-20 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r84222924 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala --- @@ -0,0 +1,226 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #12904: [SPARK-15125][SQL] Changing CSV data source mappi...

2016-10-20 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/12904#discussion_r84235107 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala --- @@ -90,6 +90,7 @@ private[csv] class CSVOptions

[GitHub] spark issue #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflict check...

2016-10-20 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15423 The tests are passed but the results are failed to post back to github... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflict check...

2016-10-20 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15423 @cloud-fan Need to run tests again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17242: [SPARK-19902][SQL] Support more expression canonicalizat...

2017-03-15 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17242 ping @cloud-fan Except for the optimization integration, do you have more comments on this change? Thanks. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #17242: [SPARK-19902][SQL] Support more expression canonicalizat...

2017-03-15 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17242 hmm, so you don't think canonicalizer should use this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #17242: [SPARK-19902][SQL] Support more expression canonicalizat...

2017-03-15 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17242 anyway, I will move it to optimizer in next update. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #17242: [SPARK-19902][SQL] Add optimization rule to simplify exp...

2017-03-16 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17242 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #17242: [SPARK-19902][SQL] Add optimization rule to simplify exp...

2017-03-17 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17242 @cloud-fan I've moved this to the optimizer, please take a look. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #17242: [SPARK-19902][SQL] Add optimization rule to simplify exp...

2017-03-19 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17242 ping @cloud-fan @hvanhovell Can you help review this? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #17186: [SPARK-19846][SQL] Add a flag to disable constraint prop...

2017-03-19 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17186 ping @sameeragarwal This is updated according to your previous comment. Can you help review this? Thanks. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #17302: [SPARK-19959][SQL] Fix to throw NullPointerExcept...

2017-03-21 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17302#discussion_r107089550 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -70,7 +70,20 @@ object RDDConversions { object ExternalRDD

[GitHub] spark pull request #17371: [SPARK-19903][PYSPARK][SS] window operator miss t...

2017-03-21 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17371#discussion_r107094080 --- Diff: python/pyspark/sql/functions.py --- @@ -1163,7 +1163,10 @@ def check_string_field(field, fieldName): raise TypeError("%s shou

[GitHub] spark pull request #17371: [SPARK-19903][PYSPARK][SS] window operator miss t...

2017-03-21 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17371#discussion_r107096074 --- Diff: python/pyspark/sql/functions.py --- @@ -1163,7 +1163,10 @@ def check_string_field(field, fieldName): raise TypeError("%s shou

[GitHub] spark issue #17371: [SPARK-19903][PYSPARK][SS] window operator miss the `wat...

2017-03-21 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17371 For now, after `withWatermark`, we only update the metadata for the column of event time. The expression id is the same. So once we use the column before adding watermark `words.timestamp` as

[GitHub] spark pull request #17302: [SPARK-19959][SQL] Fix to throw NullPointerExcept...

2017-03-21 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17302#discussion_r107117098 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -17,6 +17,8 @@ package org.apache.spark.sql.execution

[GitHub] spark issue #17371: [SPARK-19903][PYSPARK][SS] window operator miss the `wat...

2017-03-21 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17371 IMHO, the output after `withWatermark` should be new attribute and have new expression id. Maybe @zsxwing @marmbrus have more insights on this? Btw, does this issue also happen in Scala code

[GitHub] spark pull request #17302: [SPARK-19959][SQL] Fix to throw NullPointerExcept...

2017-03-21 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17302#discussion_r107169764 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -41,7 +41,20 @@ object CatalystSerde

[GitHub] spark pull request #17302: [SPARK-19959][SQL] Fix to throw NullPointerExcept...

2017-03-21 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17302#discussion_r107192456 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -41,7 +41,20 @@ object CatalystSerde

[GitHub] spark pull request #17302: [SPARK-19959][SQL] Fix to throw NullPointerExcept...

2017-03-21 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17302#discussion_r107298496 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -41,7 +41,20 @@ object CatalystSerde

[GitHub] spark pull request #17302: [SPARK-19959][SQL] Fix to throw NullPointerExcept...

2017-03-21 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17302#discussion_r107298835 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -41,7 +41,20 @@ object CatalystSerde

[GitHub] spark issue #17371: [SPARK-19903][PYSPARK][SS] window operator miss the `wat...

2017-03-21 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17371 Unfortunately, yes, allowing resolved attributes in user API will have this kind of trouble. > However, I don't think that piecemeal switching to unresolved attributes is a g

<    1   2   3   4   5   6   7   8   9   10   >