[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

chenghao-intel Sun, 02 Nov 2014 06:39:08 -0800

Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3013#discussion_r19712328
  
    --- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveTableScanSuite.scala
 ---
    @@ -69,4 +69,10 @@ class HiveTableScanSuite extends HiveComparisonTest {
         TestHive.sql("DROP TABLE timestamp_query_null")
       }
       
    +  // In unit test, kv1.txt is a small file and will be loaded as table src 
by default
    +  // And since it's a small file, then it will be consider as a single 
input split.
    +  createQueryTest("file_split_for_small_table",
    +    """
    +      |SELECT key, value FROM src SORT BY key, value;
    +    """.stripMargin)
    --- End diff --
    
    Actually this test will fail without the change 
https://github.com/chenghao-intel/spark/blob/ctas_unittest/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala#L63
 . I described root reason for this at #2589.
    Sorry I should put as independent PR, however, the 2 added 
test(`ctas.q`,`ctas_hadoop20.q`) cases will fail without this.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-4152] [SQL] Avoid data change in CTAS w...

Reply via email to