[GitHub] spark pull request: [SPARK-8658] [SQL] AttributeReference's equals...

2015-10-21 Thread gatorsmile
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/9216 [SPARK-8658] [SQL] AttributeReference's equals method compares all the members This fix is to change the equals method to check all of the specified fields for equality of AttributeReference

[GitHub] spark pull request: [SPARK-8658] [SQL] AttributeReference's equals...

2015-10-22 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9216#issuecomment-150395202 My code change expose a new defect: Both rollup and cube are not working correctly no matter whether the build include my changes or not. Without my

[GitHub] spark pull request: [SPARK-11360] [Doc] Loss of nullability when w...

2015-10-27 Thread gatorsmile
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/9314 [SPARK-11360] [Doc] Loss of nullability when writing parquet files This fix is to add one line to explain the current behavior of Spark SQL when writing Parquet files. All columns are forced

[GitHub] spark pull request: [SPARK-8658] [SQL] AttributeReference's equals...

2015-10-22 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9216#issuecomment-150475832 Hi, @cloud-fan Sure. Will do. I am trying to see if I can easily fix it. Anyway, I will open a JIRA tonight. Thanks, Xiao Li

[GitHub] spark pull request: [SPARK-8658] [SQL] AttributeReference's equals...

2015-10-23 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9216#issuecomment-150486350 The JIRA is opened: https://issues.apache.org/jira/browse/SPARK-11275 I will continue the investigation on this JIRA issue. --- If your project is set up

[GitHub] spark pull request: [SPARK-10838][SPARK-11576][SQL][WIP] Incorrect...

2015-11-11 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9548#issuecomment-155835390 @cloud-fan Before discussing the solution details, let us first talk about the design issues. IMO, the `DataFrame` is a query language, kind of a dialect

[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

2015-11-10 Thread gatorsmile
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/9385 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

2015-11-10 Thread gatorsmile
GitHub user gatorsmile reopened a pull request: https://github.com/apache/spark/pull/9385 [SPARK-11433] [SQL] Cleanup the subquery name after eliminating subquery This fix is to remove the subquery name in qualifiers after eliminating subquery. You can merge this pull request

[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

2015-11-10 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9385#issuecomment-155605985 @marmbrus After rechecking the root reason why Expand failed, I still think we should cleanup the subquery name after subquery elimination. My current fix

[GitHub] spark pull request: [SPARK-10838][SPARK-11576][SQL][WIP] Incorrect...

2015-11-08 Thread gatorsmile
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/9548 [SPARK-10838][SPARK-11576][SQL][WIP] Incorrect results or exceptions when using self-joins When resolving the attributeReference's ambiguity caused by self joins, the current solution only

[GitHub] spark pull request: [SPARK-10838][SPARK-11576][SQL][WIP] Incorrect...

2015-11-08 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9548#issuecomment-154881441 Since this solution requires adding quantifier comparison into the equation of attributeReferences, this will fail a couple test cases in expand. We have

[GitHub] spark pull request: [SPARK-10838][SPARK-11576][SQL][WIP] Incorrect...

2015-11-09 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9548#issuecomment-155183305 I can't fix the problem without a major code change. The current design of dataFrame has a fundamental problem. When using column references, we might hit various

[GitHub] spark pull request: [SPARK-10838][SPARK-11576][SQL][WIP] Incorrect...

2015-11-08 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9548#issuecomment-154911463 To fix these failed cases, I will move the dataFrame's hashCode to the Column class, instead of directly putting the values to quantifiers. --- If your project

[GitHub] spark pull request: [SPARK-11360] [Doc] Loss of nullability when w...

2015-11-09 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9314#issuecomment-155334571 Got it, thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-11360] [Doc] Loss of nullability when w...

2015-11-09 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9314#issuecomment-155309645 @marmbrus Should I reopen it? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-11275][SQL] Rollup and Cube Generates t...

2015-11-11 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9419#issuecomment-155973699 Thank you, Hao! Will do it in the next few days. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [Spark-11637][SQL] Regression in UDF: exceptio...

2015-11-12 Thread gatorsmile
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/9683 [Spark-11637][SQL] Regression in UDF: exceptions when using Stars and Alias When using UDF in Spark SQL, the query failed if star and alias are used at the same time. This works in 1.4.x

[GitHub] spark pull request: [Spark-11637][SQL] Regression in UDF: exceptio...

2015-11-12 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9683#issuecomment-156319117 The issue has been fixed in https://github.com/apache/spark/pull/9343. I will close this PR. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

2015-11-13 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9385#issuecomment-156612181 Hi, @marmbrus Originally, I thought quantifiers are part of identifiers, like schema name in traditional RDBMS. Based on your explanation, this is not true

[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

2015-11-16 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9385#issuecomment-157239704 Sure. Close it. Thank you for your time! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

2015-11-16 Thread gatorsmile
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/9385 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-8658] [SQL] AttributeReference's equals...

2015-11-16 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/9216#discussion_r45011838 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala --- @@ -194,7 +194,9 @@ case class

[GitHub] spark pull request: [SPARK-9928][SQL] Removal of LogicalLocalTable...

2015-11-15 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9717#issuecomment-156943032 Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-11275][SQL] Rollup and Cube Generates t...

2015-11-11 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9419#issuecomment-155951403 Please let me know if I need to resolve these conflicts. @cloud-fan @chenghao-intel @marmbrus @rxin --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-10838][SPARK-11576][SQL][WIP] Incorrect...

2015-11-11 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9548#issuecomment-155912100 @cloud-fan So far, we do not have an easy fix, but I believe we should never return a wrong result for self join. Let me post the test case I added

[GitHub] spark pull request: [SPARK-8658] [SQL] [FOLLOW-UP] AttributeRefere...

2015-11-17 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9761#issuecomment-157414057 @nongli I saw you have a related discussion with @chenghao-intel . The failed test case was introduced in your PR https://github.com/apache/spark/pull/9480. I am

[GitHub] spark pull request: [SPARK-8658] [SQL] [FOLLOW-UP] AttributeRefere...

2015-11-17 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9761#issuecomment-157434770 Ok. I will also add three more lines for covering the new `hashCode` and `equals` functions. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-11072][SQL] simplify self join handling

2015-11-17 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9081#issuecomment-157440817 @cloud-fan I am wondering if this will be merged soon? I am not sure if I should fix a couple of self join issues before your merge. Or I should not waste

[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

2015-11-10 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9385#issuecomment-15565 Hi, @marmbrus After digging the root reason why Expand cases failed, I found we still need a deeper clean of subquery after elimination. Let me

[GitHub] spark pull request: [SPARK-10838][SPARK-11576][SQL][WIP] Incorrect...

2015-11-09 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9548#issuecomment-155226523 @marmbrus Thank you for your suggestions! That is also like my initial idea. I did a try last night. Unfortunately, I hit a problem when adding such a field

[GitHub] spark pull request: [SPARK-11275][SQL][WIP] Rollup and Cube Genera...

2015-11-02 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9419#issuecomment-153148409 @hvanhovell Your understanding is right. If we merge both grouping and aggregation together, it will introduce extra complexity to generate the logical plan

[GitHub] spark pull request: [SPARK-11275][SQL][WIP] Rollup and Cube Genera...

2015-11-02 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9419#issuecomment-153146798 @holdenk This is the PR I mentioned in the email. Could you review it too? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-11275][SQL][WIP] Rollup and Cube Genera...

2015-11-02 Thread gatorsmile
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/9419 [SPARK-11275][SQL][WIP] Rollup and Cube Generates the Incorrect Results when Aggregation Functions Use Group By Columns In the current implementation, Rollup and Cube are unable to generate

[GitHub] spark pull request: [SPARK-11275][SQL][WIP] Rollup and Cube Genera...

2015-11-02 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9419#issuecomment-153145967 Hi, Rick, 1) This is a defect identified by me. It blocks my PR. It was introduced in the initial implementation. Thus, it is not a regression. 2

[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

2015-10-30 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/9385#discussion_r43559826 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1019,7 +1019,16 @@ class Analyzer( * scoping

[GitHub] spark pull request: [SPARK-11360] [Doc] Loss of nullability when w...

2015-10-30 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9314#issuecomment-152656067 @marmbrus : as you suggested, I submitted the pull request. Could you review it? Thanks, Xiao Li --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

2015-10-30 Thread gatorsmile
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/9385 [SPARK-11433] [SQL] Cleanup the subquery name after eliminating subquery This fix is to remove the subquery name in qualifiers after eliminating subquery. You can merge this pull request

[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

2015-10-31 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9385#issuecomment-152708351 So far, I just observed this strange ghosting values when I read the optimized logical tree, but my query did not trigger any issue. Based on my

[GitHub] spark pull request: [SPARK-4226][SQL]Add subquery (not) in/exists ...

2015-11-04 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9055#issuecomment-153857289 @jameszhouyi We hit the same issue. Now, we bypass it by using joins. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-4226][SQL]Add subquery (not) in/exists ...

2015-11-04 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9055#issuecomment-153920042 @jameszhouyi Agree. This is an important feature for any SQL engine. We are also waiting for this feature. So far, using joins is an alternative to bypass

[GitHub] spark pull request: [SPARK-11275][SQL][WIP] Rollup and Cube Genera...

2015-11-03 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9419#issuecomment-153451771 @chenghao-intel @hvanhovell Unit test cases are added. Will finish the code for resolving the comments by @holdenk @rick-ibm @rxin @marmbrus @liancheng

[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

2015-11-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/9385#discussion_r43817697 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1019,7 +1019,16 @@ class Analyzer( * scoping

[GitHub] spark pull request: [SPARK-6231][SQL/DF] Automatically resolve joi...

2015-11-06 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/5919#issuecomment-154612297 @rxin @marmbrus This fix is unable to resolve the condition ambiguity for nested self join. I also found the self joins could generate incorrect results

[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

2015-11-06 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9385#issuecomment-154650945 @marmbrus Thanks! I will try to change equals to semanticEquals in the pull request https://github.com/apache/spark/pull/9216. Then, you can decide

[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

2015-11-06 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9385#issuecomment-154609690 @marmbrus I already hit this issue when resolving https://issues.apache.org/jira/browse/SPARK-8658. That means, when comparing two AttributeReferences, we should

[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

2015-11-03 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9385#issuecomment-153529973 @cloud-fan @dbtsai , Jenkins did not start the testing. Could you let Jenkins to test it? Thank you! --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

2015-11-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/9385#discussion_r43826123 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1019,7 +1019,16 @@ class Analyzer( * scoping

[GitHub] spark pull request: [SPARK-8658] [SQL] AttributeReference's equals...

2015-11-03 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9216#issuecomment-153511090 @JoshRosen @cloud-fan I submitted a pull request for JIRA Spark-11275: https://github.com/apache/spark/pull/9419 Hopefully, after finishing the problem

[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

2015-11-03 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9385#issuecomment-153620226 @dbtsai Thank you! Please let me know if you need any extra code change. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-11275][SQL] Rollup and Cube Generates t...

2015-11-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/9419#discussion_r43850164 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -232,7 +232,7 @@ class Analyzer

[GitHub] spark pull request: [SPARK-11275][SQL][WIP] Rollup and Cube Genera...

2015-11-02 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9419#issuecomment-153192515 @rick-ibm Will add more comments to explain it. Especially, I will emphasize this design will expect the optimizer collapses these two projections into a single one

[GitHub] spark pull request: [SPARK-11275][SQL] Rollup and Cube Generates t...

2015-11-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/9419#discussion_r44107333 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -232,7 +232,7 @@ class Analyzer

[GitHub] spark pull request: [SPARK-11633] [SQL] HiveContext's Case Insensi...

2015-11-18 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9762#issuecomment-157786406 @cloud-fan @marmbrus Will follow your suggestions to update the fix. Thanks! --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-11803][SQL] fix Dataset self-join

2015-11-18 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9806#issuecomment-157749632 Your code looks pretty clean to me. Let me share my test cases this PR failed. ``` test("joinWith tuple - self join 1") { val ds = S

[GitHub] spark pull request: [SPARK-11803][SQL] fix Dataset self-join

2015-11-18 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9806#issuecomment-157807682 Sure. Will do. Thanks! 2015-11-18 10:16 GMT-08:00 Michael Armbrust <notificati...@github.com>: > LGTM, merging to maste

[GitHub] spark pull request: [SPARK-11633] [SQL] HiveContext's Case Insensi...

2015-11-18 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9762#issuecomment-157857086 @marmbrus @cloud-fan Based on your comments, I did the change. Please review the new change. I also tried the fix after excluding the change

[GitHub] spark pull request: [SPARK-9928][SQL] Removal of LogicalLocalTable...

2015-11-14 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9717#issuecomment-156738825 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-11433] [SQL] Cleanup the subquery name ...

2015-11-14 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9385#issuecomment-156730810 @marmbrus CachedTableSuite failed due to the same reason. We did not clean up the subquery names. Thus, it is unable to give a correct result when deciding

[GitHub] spark pull request: [SPARK-9928][SQL] Removal of LogicalLocalTable...

2015-11-14 Thread gatorsmile
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/9717 [SPARK-9928][SQL] Removal of LogicalLocalTable LogicalLocalTable in ExistingRDD.scala is replaced by localRelation in LocalRelation.scala? Do you know any reason why we still keep

[GitHub] spark pull request: [SPARK-9928][SQL] Removal of LogicalLocalTable...

2015-11-14 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9717#issuecomment-156738584 The failure of this test case is not related to the code changes. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-9928][SQL] Removal of LogicalLocalTable...

2015-11-14 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9717#issuecomment-156739572 @srowen Could you review the changes? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-9928][SQL] Removal of LogicalLocalTable...

2015-11-14 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/9717#issuecomment-156745623 Another case failed due to the same reasons. ``` [error] Test org.apache.spark.ml.util.JavaDefaultReadWriteSuite.testDefaultReadWrite failed

[GitHub] spark pull request: [SPARK-11633] [SQL] HiveContext's Case Insensi...

2015-11-17 Thread gatorsmile
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/9762 [SPARK-11633] [SQL] HiveContext's Case Insensitivity in Self-Join Handling When handling self joins, the implementation did not consider the case insensitivity of HiveContext. It could cause

[GitHub] spark pull request: [SPARK-8658] [SQL] [FOLLOW-UP] AttributeRefere...

2015-11-16 Thread gatorsmile
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/9761 [SPARK-8658] [SQL] [FOLLOW-UP] AttributeReference's equals method compares all the members Based on the comment of @cloud-fan , update the AttributeReference's hashCode function by including

[GitHub] spark pull request: [SPARK-12028] [SQL] get_json_object returns an...

2015-11-27 Thread gatorsmile
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/10018 [SPARK-12028] [SQL] get_json_object returns an incorrect result when the value is null literals When calling `get_json_object` for the following two cases, both results are `"

[GitHub] spark pull request: [SPARK-12195] [SQL] Adding BigDecimal, Date an...

2015-12-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10188#discussion_r46917559 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java --- @@ -386,6 +389,20 @@ public void testNestedTupleEncoder

[GitHub] spark pull request: [SPARK-12091] [PYSPARK] [Minor] Default storag...

2015-12-02 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10092#issuecomment-161372323 @mateiz Thank you for your answer! Will try to do it soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-12113] [SQL] Add some timing metrics fo...

2015-12-02 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10116#discussion_r46501447 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -149,6 +149,32 @@ private[sql] object SQLMetrics

[GitHub] spark pull request: [SPARK-12091] [PYSPARK] Removal of the JAVA-sp...

2015-12-02 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10092#discussion_r46520645 --- Diff: python/pyspark/storagelevel.py --- @@ -49,12 +51,8 @@ def __str__(self): StorageLevel.DISK_ONLY = StorageLevel(True, False, False

[GitHub] spark pull request: [SPARK-12091] [PYSPARK] Removal of the JAVA-sp...

2015-12-02 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10092#issuecomment-161515703 Just saw the comments and will change the names soon. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

2015-12-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/9358#discussion_r46657105 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/Encoder.scala --- @@ -37,3 +37,120 @@ trait Encoder[T] extends Serializable

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

2015-12-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/9358#discussion_r46650956 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/Encoder.scala --- @@ -37,3 +37,120 @@ trait Encoder[T] extends Serializable

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

2015-12-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/9358#discussion_r46657310 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/Encoder.scala --- @@ -37,3 +37,120 @@ trait Encoder[T] extends Serializable

[GitHub] spark pull request: [SPARK-12091] [PYSPARK] Removal of the JAVA-sp...

2015-12-02 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10092#issuecomment-161514977 - Removed all the constants whose `deserialized` values are true. - Update the comments of StorageLevel - Change the default storage levels of Kinesis level

[GitHub] spark pull request: [SPARK-12091] [PYSPARK] Removal of the JAVA-sp...

2015-12-02 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10092#issuecomment-161522366 Based on the comments of @mateiz , the extra changes are made: - Renaming MEMORY_ONLY_SER to MEMORY_ONLY - Renaming MEMORY_ONLY_SER_2 to MEMORY_ONLY_2

[GitHub] spark pull request: [SPARK-12091] [PYSPARK] Deprecate the JAVA-spe...

2015-12-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10092#discussion_r46522595 --- Diff: python/pyspark/storagelevel.py --- @@ -49,12 +51,8 @@ def __str__(self): StorageLevel.DISK_ONLY = StorageLevel(True, False, False

[GitHub] spark pull request: [SPARK-12158] [R] [SQL] Fix 'sample' functions...

2015-12-06 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10160#issuecomment-162286394 @felixcheung @sun-rui Thank you! Based on your comments, I did the changes. Please review the changes. : ) --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-12158] [R] [SQL] Fix 'sample' functions...

2015-12-06 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10160#issuecomment-162328075 @felixcheung I am not sure if we need to add a test case for `sample`. Normally, using a specific seed is the common way to verify the result of `sample

[GitHub] spark pull request: [SPARK-12164] [SQL] Display the binary/encoded...

2015-12-06 Thread gatorsmile
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/10165 [SPARK-12164] [SQL] Display the binary/encoded values When the dataset is encoded, the existing display looks strange. Decimal format is not common when the type is binary

[GitHub] spark pull request: [SPARK-12158] [R] [SQL] Fix 'sample' functions...

2015-12-06 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10160#issuecomment-162286420 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-12150] [SQL] [Minor] Add range API with...

2015-12-04 Thread gatorsmile
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/10149 [SPARK-12150] [SQL] [Minor] Add range API without specifying the slice number For usability, add another sqlContext.range() method. Users can specify start, end, and step without specifying

[GitHub] spark pull request: [SPARK-12164] [SQL] Display the binary/encoded...

2015-12-06 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10165#issuecomment-162400140 I have the exact same question when calling the show function. From the perspectives of users, they might not care the encoded values at all when calling

[GitHub] spark pull request: [SPARK-12158] [SparkR] [SQL] Fix 'sample' func...

2015-12-06 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10160#issuecomment-162415274 @felixcheung @shivaram Sure, just added that test case. Please review it. Thank you! : ) --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-12158] [SparkR] [SQL] Fix 'sample' func...

2015-12-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10160#discussion_r46788421 --- Diff: R/pkg/R/DataFrame.R --- @@ -677,13 +677,15 @@ setMethod("unique", #' collect(sample(df, TRUE, 0.5)) #'} setMeth

[GitHub] spark pull request: [SPARK-12158] [SparkR] [SQL] Fix 'sample' func...

2015-12-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10160#discussion_r46789803 --- Diff: R/pkg/R/DataFrame.R --- @@ -692,8 +696,8 @@ setMethod("sample", setMethod("sample_frac", sign

[GitHub] spark pull request: [SPARK-12158] [SparkR] [SQL] Fix 'sample' func...

2015-12-06 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10160#issuecomment-162428939 @shivaram @felixcheung @sun-rui Thank you everyone! Hopefully, my code changes resolve all your concerns. I learned a lot from you! : ) --- If your project is set

[GitHub] spark pull request: [SPARK-12158] [SparkR] [SQL] Fix 'sample' func...

2015-12-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10160#discussion_r46789542 --- Diff: R/pkg/R/DataFrame.R --- @@ -692,8 +696,8 @@ setMethod("sample", setMethod("sample_frac", sign

[GitHub] spark pull request: [SPARK-12150] [SQL] [Minor] Add range API with...

2015-12-08 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10149#issuecomment-163093870 @marmbrus @cloud-fan This PR changes the external API. Not sure if this will be merged or we should revisit it after the release of 1.6? Thank you! --- If your

[GitHub] spark pull request: [SPARK-12164] [SQL] Decode the encoded values ...

2015-12-08 Thread gatorsmile
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/10215 [SPARK-12164] [SQL] Decode the encoded values and then display Based on the suggestions from @marmbrus @cloud-fan in https://github.com/apache/spark/pull/10165 , this PR is to print the decoded

[GitHub] spark pull request: [SPARK-12188] [SQL] Code refactoring and comme...

2015-12-08 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10184#discussion_r47046420 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -429,18 +432,18 @@ class Dataset[T] private[sql

[GitHub] spark pull request: [SPARK-12188] [SQL] Code refactoring and comme...

2015-12-08 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10184#discussion_r47046431 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -67,15 +67,21 @@ class Dataset[T] private[sql]( tEncoder: Encoder[T

[GitHub] spark pull request: [SPARK-12188] [SQL] Code refactoring and comme...

2015-12-08 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10184#discussion_r47046739 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -67,15 +67,21 @@ class Dataset[T] private[sql]( tEncoder: Encoder[T

[GitHub] spark pull request: [SPARK-12188] [SQL] [FOLLOW-UP] Code refactori...

2015-12-08 Thread gatorsmile
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/10214 [SPARK-12188] [SQL] [FOLLOW-UP] Code refactoring and comment correction in Dataset APIs @marmbrus This PR is to address your comment. Thanks for your review! You can merge this pull request

[GitHub] spark pull request: [SPARK-12164] [SQL] Display the binary/encoded...

2015-12-08 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10165#issuecomment-163093753 Thank you! @cloud-fan Will this PR be merged to 1.6? Or waiting for another PR for showing decoded values? @marmbrus Thank you! --- If your project

[GitHub] spark pull request: [SPARK-12158] [R] [SQL] Fix 'sample' functions...

2015-12-05 Thread gatorsmile
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/10160 [SPARK-12158] [R] [SQL] Fix 'sample' functions that break R unit test cases The existing sample functions miss the parameter 'seed', however, the corresponding function interface in `generics

[GitHub] spark pull request: [SPARK-12138] [SQL] Escape \u in the generated...

2015-12-04 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10155#issuecomment-162140125 Weird... My code changes are not related to the failed test case in SparkR. ``` count(sampled3) < 3 isn't true ``` --- If your project is

[GitHub] spark pull request: [SPARK-12138] [SQL] Escape \u in the generated...

2015-12-04 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10155#issuecomment-162140292 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-12138] [SQL] Escape \u in the generated...

2015-12-04 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10155#issuecomment-162148385 Found a bug in the function `sample` of R. Will submit a PR later. Thanks! --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-12158] [R] [SQL] Fix 'sample' functions...

2015-12-05 Thread gatorsmile
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10160#issuecomment-162241856 @davies Could you take a look at this PR? Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [WIP][SPARK-12069][SQL] Update documentation w...

2015-12-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10060#discussion_r46624363 --- Diff: docs/sql-programming-guide.md --- @@ -9,18 +9,51 @@ title: Spark SQL and DataFrames # Overview -Spark SQL is a Spark module

  1   2   3   4   5   6   7   8   9   10   >