Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4158#discussion_r23514311
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
    @@ -633,14 +633,28 @@ 
https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C
                        Token(script, Nil) ::
                        Token("TOK_SERDE", serdeClause) ::
                        Token("TOK_RECORDREADER", readerClause) ::
    -                   outputClause :: Nil) :: Nil) =>
    +                   outputClause) :: Nil) =>
     
    +            // TODO the output should be bind with the output clause or 
RecordReader
                 val output = outputClause match {
    -              case Token("TOK_ALIASLIST", aliases) =>
    +              case Token("TOK_ALIASLIST", aliases) :: Nil =>
                     aliases.map { case Token(name, Nil) => 
AttributeReference(name, StringType)() }
    -              case Token("TOK_TABCOLLIST", attributes) =>
    +              case Token("TOK_TABCOLLIST", attributes) :: Nil =>
                     attributes.map { case Token("TOK_TABCOL", Token(name, Nil) 
:: dataType :: Nil) =>
                       AttributeReference(name, nodeToDataType(dataType))() }
    +              case Nil => // Not specified the output field names, let it 
be the same as input
    +                (0 to inputExprs.length - 1).map { idx =>
    +                  // Keep the same as Hive does, the first field names is 
"key", and second is
    +                  // "value", however, Hive seems gives null string for 
the rest of the
    --- End diff --
    
    I think it is expected results as the Hive manual describes about 
'Schema-less Map-reduce Scripts
    ' in 
[transform](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform):
 
    > If there is no AS clause after USING my_script, Hive assumes that the 
output of the script contains 2 parts: key which is before the first tab, and 
value which is the rest after the first tab.
    
    So in your results, `value` column gets all query outputs after the first 
tab. The results of table `test2` is just the alignment problem caused by tabs. 
It should follow the same rule too.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to