[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-04-24 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-95879508
  
@chenghao-intel ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-04-24 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-95857854
  
I just file a jira issue, https://issues.apache.org/jira/browse/SPARK-7119. 
@viirya can you help on investigate this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-02-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4014


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-02-02 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-72546189
  
Thanks!  Merged to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-02-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-72464777
  
  [Test build #26516 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26516/consoleFull)
 for   PR 4014 at commit 
[`ac2d1fe`](https://github.com/apache/spark/commit/ac2d1fe1093f2ad442ff1bd475cec0a38c096e38).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class HiveScriptIOSchema (`
  * `  val trimed_class = serdeClassName.split(')(1)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-72464789
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26516/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-02-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-72455639
  
  [Test build #26516 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26516/consoleFull)
 for   PR 4014 at commit 
[`ac2d1fe`](https://github.com/apache/spark/commit/ac2d1fe1093f2ad442ff1bd475cec0a38c096e38).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-02-02 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-72465789
  
@marmbrus I did some refactoring for the comments. It should be better now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-02-02 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23932224
  
--- Diff: 
sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim12.scala ---
@@ -241,8 +241,14 @@ private[hive] object HiveShim {
   Decimal(hdoi.getPrimitiveJavaObject(data).bigDecimalValue())
 }
   }
+
+  implicit def prepareWritable(shimW: ShimWritable): Writable = {
+shimW.writable
+  }
 }
 
+case class ShimWritable(writable: Writable)
--- End diff --

Yes, I think implicits in this case just make the code harder to read.
On Feb 1, 2015 11:32 PM, Liang-Chi Hsieh notificati...@github.com wrote:

 In sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim12.scala
 https://github.com/apache/spark/pull/4014#discussion_r23910154:

   }
 
  +case class ShimWritable(writable: Writable)

 If we skip ShimWriteable, we then need to remove implicit from
 prepareWriteable and explicitly call it to do the fixing. Is it better?
 If so, I can do it in this way.

 It does not break Hive 12 because we just pass the underlying writable
 object without touching it. We only do the fixing on Hive 13.

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/4014/files#r23910154.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-02-02 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23932391
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ScriptTransformation.scala
 ---
@@ -25,9 +25,18 @@ import 
org.apache.spark.sql.catalyst.expressions.{Attribute, Expression}
  * @param input the set of expression that should be passed to the script.
  * @param script the command that should be executed.
  * @param output the attributes that are produced by the script.
+ * @param ioschema the input and output schema applied in the execution of 
the script.
  */
 case class ScriptTransformation(
 input: Seq[Expression],
 script: String,
 output: Seq[Attribute],
-child: LogicalPlan) extends UnaryNode
+child: LogicalPlan,
+ioschema: Option[ScriptInputOutputSchema]) extends UnaryNode
--- End diff --

What other cases?  There is no need to add complexity to the API unless it
will be used.

In the way you have designed it now, none will result in a confusing error
later in execution.  Just disallow it statically by not making this an
option.
On Feb 1, 2015 11:38 PM, Liang-Chi Hsieh notificati...@github.com wrote:

 In
 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ScriptTransformation.scala
 https://github.com/apache/spark/pull/4014#discussion_r23910289:

*/
   case class ScriptTransformation(
   input: Seq[Expression],
   script: String,
   output: Seq[Attribute],
  -child: LogicalPlan) extends UnaryNode
  +child: LogicalPlan,
  +ioschema: Option[ScriptInputOutputSchema]) extends UnaryNode

 In Hive case, it is not. But I think it may be for other cases?

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/4014/files#r23910289.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-02-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23932681
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ScriptTransformation.scala
 ---
@@ -25,9 +25,18 @@ import 
org.apache.spark.sql.catalyst.expressions.{Attribute, Expression}
  * @param input the set of expression that should be passed to the script.
  * @param script the command that should be executed.
  * @param output the attributes that are produced by the script.
+ * @param ioschema the input and output schema applied in the execution of 
the script.
  */
 case class ScriptTransformation(
 input: Seq[Expression],
 script: String,
 output: Seq[Attribute],
-child: LogicalPlan) extends UnaryNode
+child: LogicalPlan,
+ioschema: Option[ScriptInputOutputSchema]) extends UnaryNode
--- End diff --

OK. Modified in latest commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-02-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23932592
  
--- Diff: 
sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim12.scala ---
@@ -241,8 +241,14 @@ private[hive] object HiveShim {
   Decimal(hdoi.getPrimitiveJavaObject(data).bigDecimalValue())
 }
   }
+
+  implicit def prepareWritable(shimW: ShimWritable): Writable = {
+shimW.writable
+  }
 }
 
+case class ShimWritable(writable: Writable)
--- End diff --

OK. I already skip `ShimWriteable`  in latest commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-02-01 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23904350
  
--- Diff: 
sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim13.scala ---
@@ -395,9 +397,26 @@ private[hive] object HiveShim {
   Decimal(hdoi.getPrimitiveJavaObject(data).bigDecimalValue(), 
hdoi.precision(), hdoi.scale())
 }
   }
+
+  implicit def prepareWritable(shimW: ShimWritable): Writable = {
+shimW.writable match {
+  case w: AvroGenericRecordWritable =
+w.setRecordReaderID(new UID())
+  case _ =
+} 
+shimW.writable
+  }
 }
 
 /*
+ * Bug introdiced in hive-0.13. AvroGenericRecordWritable has a member 
recordReaderID that
--- End diff --

introduced


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-02-01 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23904492
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ---
@@ -167,8 +167,10 @@ private[hive] trait HiveStrategies {
 
   object Scripts extends Strategy {
 def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
-  case logical.ScriptTransformation(input, script, output, child) =
-ScriptTransformation(input, script, output, 
planLater(child))(hiveContext) :: Nil
+  case logical.ScriptTransformation(input, script, output, child, 
schema) =
+  ScriptTransformation(input, script, output,
+planLater(child), schema.map{ case s: HiveScriptIOSchema = s 
}.get
--- End diff --

Given that it is safe to just call `get` here it seems like `Option` might 
not be the correct choice for the data model.  I'd also do the casting by place 
a type annotation in the match.

```scala
case logical.ScriptTransformation(input, script, output, child, schema: 
HiveScriptIOSchema) =
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-02-01 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-72402698
  
Thanks for working on this!  It would be great if this could be updated 
soon so we can include it in 1.3.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-02-01 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23904457
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ---
@@ -167,8 +167,10 @@ private[hive] trait HiveStrategies {
 
   object Scripts extends Strategy {
 def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
-  case logical.ScriptTransformation(input, script, output, child) =
-ScriptTransformation(input, script, output, 
planLater(child))(hiveContext) :: Nil
+  case logical.ScriptTransformation(input, script, output, child, 
schema) =
+  ScriptTransformation(input, script, output,
--- End diff --

Typically once you need to wrap, we wrap all argument on their own line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-02-01 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23904448
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -627,29 +628,71 @@ 
https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C
   case Token(TOK_SELEXPR,
  Token(TOK_TRANSFORM,
Token(TOK_EXPLIST, inputExprs) ::
-   Token(TOK_SERDE, Nil) ::
+   Token(TOK_SERDE, inputSerdeClause) ::
Token(TOK_RECORDWRITER, writerClause) ::
// TODO: Need to support other types of (in/out)put
Token(script, Nil) ::
-   Token(TOK_SERDE, serdeClause) ::
+   Token(TOK_SERDE, outputSerdeClause) ::
Token(TOK_RECORDREADER, readerClause) ::
-   outputClause :: Nil) :: Nil) =
-
-val output = outputClause match {
-  case Token(TOK_ALIASLIST, aliases) =
-aliases.map { case Token(name, Nil) = 
AttributeReference(name, StringType)() }
-  case Token(TOK_TABCOLLIST, attributes) =
-attributes.map { case Token(TOK_TABCOL, Token(name, Nil) 
:: dataType :: Nil) =
-  AttributeReference(name, nodeToDataType(dataType))() }
+   outputClause) :: Nil) =
+
+val (output, schemaLess) = outputClause match {
+  case Token(TOK_ALIASLIST, aliases) :: Nil =
+(aliases.map { case Token(name, Nil) = 
AttributeReference(name, StringType)() },
+  false)
+  case Token(TOK_TABCOLLIST, attributes) :: Nil =
+(attributes.map { case Token(TOK_TABCOL, Token(name, 
Nil) :: dataType :: Nil) =
+  AttributeReference(name, nodeToDataType(dataType))() }, 
false)
+  case Nil =
+(List(AttributeReference(key, StringType)(),
+  AttributeReference(value, StringType)()), true)
 }
+
+val (inputRowFormat, inputSerdeClass, inputSerdeProps) = 
inputSerdeClause match {
+  case Token(TOK_SERDEPROPS, props) :: Nil =
+(props.map { case Token(name, Token(value, Nil) :: Nil) = 
(name, value) },
+  , Nil)
+  case Token(TOK_SERDENAME, Token(serde, Nil) :: Nil) :: Nil 
= (Nil, serde, Nil)
+  case Token(TOK_SERDENAME, Token(serde, Nil) ::
+ Token(TOK_TABLEPROPERTIES,
+ Token(TOK_TABLEPROPLIST, props) :: Nil) :: Nil) :: 
Nil =
+val tableprops = props.map {
+  case Token(TOK_TABLEPROPERTY, Token(name, Nil) :: 
Token(value, Nil) :: Nil) =
+(name, value)
+} 
+(Nil, serde, tableprops)
+  case Nil = (Nil, , Nil)
+}
+
+val (outputRowFormat, outputSerdeClass, outputSerdeProps) = 
outputSerdeClause match {
+  case Token(TOK_SERDEPROPS, props) :: Nil =
+(props.map { case Token(name, Token(value, Nil) :: Nil) = 
(name, value) },
+  , Nil)
+  case Token(TOK_SERDENAME, Token(serde, Nil) :: Nil) :: Nil 
= (Nil, serde, Nil)
+  case Token(TOK_SERDENAME, Token(serde, Nil) ::
+ Token(TOK_TABLEPROPERTIES,
+ Token(TOK_TABLEPROPLIST, props) :: Nil) :: Nil) :: 
Nil =
--- End diff --

Indent 2 spaces?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-02-01 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23904432
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -627,29 +628,71 @@ 
https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C
   case Token(TOK_SELEXPR,
  Token(TOK_TRANSFORM,
Token(TOK_EXPLIST, inputExprs) ::
-   Token(TOK_SERDE, Nil) ::
+   Token(TOK_SERDE, inputSerdeClause) ::
Token(TOK_RECORDWRITER, writerClause) ::
// TODO: Need to support other types of (in/out)put
Token(script, Nil) ::
-   Token(TOK_SERDE, serdeClause) ::
+   Token(TOK_SERDE, outputSerdeClause) ::
Token(TOK_RECORDREADER, readerClause) ::
-   outputClause :: Nil) :: Nil) =
-
-val output = outputClause match {
-  case Token(TOK_ALIASLIST, aliases) =
-aliases.map { case Token(name, Nil) = 
AttributeReference(name, StringType)() }
-  case Token(TOK_TABCOLLIST, attributes) =
-attributes.map { case Token(TOK_TABCOL, Token(name, Nil) 
:: dataType :: Nil) =
-  AttributeReference(name, nodeToDataType(dataType))() }
+   outputClause) :: Nil) =
+
+val (output, schemaLess) = outputClause match {
+  case Token(TOK_ALIASLIST, aliases) :: Nil =
+(aliases.map { case Token(name, Nil) = 
AttributeReference(name, StringType)() },
+  false)
+  case Token(TOK_TABCOLLIST, attributes) :: Nil =
+(attributes.map { case Token(TOK_TABCOL, Token(name, 
Nil) :: dataType :: Nil) =
+  AttributeReference(name, nodeToDataType(dataType))() }, 
false)
+  case Nil =
+(List(AttributeReference(key, StringType)(),
+  AttributeReference(value, StringType)()), true)
 }
+
+val (inputRowFormat, inputSerdeClass, inputSerdeProps) = 
inputSerdeClause match {
+  case Token(TOK_SERDEPROPS, props) :: Nil =
+(props.map { case Token(name, Token(value, Nil) :: Nil) = 
(name, value) },
--- End diff --

These tuples with maps are a little hard to read.  I'd consider using 
intermediate variables.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-02-01 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23904329
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -33,7 +33,8 @@ import org.apache.spark.sql.catalyst.plans._
 import org.apache.spark.sql.catalyst.plans.logical
 import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.execution.ExplainCommand
-import org.apache.spark.sql.hive.execution.{HiveNativeCommand, DropTable, 
AnalyzeTable}
+import org.apache.spark.sql.hive.execution.{HiveNativeCommand, DropTable, 
AnalyzeTable,
+  HiveScriptIOSchema}
--- End diff --

Don't wrap imports.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-02-01 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23904318
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
 ---
@@ -53,28 +69,205 @@ case class ScriptTransformation(
   val inputStream = proc.getInputStream
   val outputStream = proc.getOutputStream
   val reader = new BufferedReader(new InputStreamReader(inputStream))
+ 
+  val (outputSerde, outputSoi) = ioschema.initOutputSerDe(output)
+
+  val iterator: Iterator[Row] = new Iterator[Row] with HiveInspectors {
+var cacheRow: Row = null
+var curLine: String = null
+var eof: Boolean = false
+
+override def hasNext: Boolean = {
+  if (outputSerde == null) {
+if (curLine == null) {
+  curLine = reader.readLine()
+  curLine != null
+} else {
+  true
+}
+  } else {
+!eof
+  }
+}
 
-  // TODO: This should be exposed as an iterator instead of reading in 
all the data at once.
-  val outputLines = collection.mutable.ArrayBuffer[Row]()
-  val readerThread = new Thread(Transform OutputReader) {
-override def run() {
-  var curLine = reader.readLine()
-  while (curLine != null) {
-// TODO: Use SerDe
-outputLines += new 
GenericRow(curLine.split(\t).asInstanceOf[Array[Any]])
+def deserialize(): Row = {
+  if (cacheRow != null) return cacheRow
+
+  val mutableRow = new SpecificMutableRow(output.map(_.dataType))
+  try {
+val dataInputStream = new DataInputStream(inputStream)
+val writable = outputSerde.getSerializedClass().newInstance
+writable.readFields(dataInputStream)
+
+val raw = outputSerde.deserialize(writable)
+val dataList = outputSoi.getStructFieldsDataAsList(raw)
+val fieldList = outputSoi.getAllStructFieldRefs()
+
+var i = 0
+dataList.foreach( element = {
+  if (element == null) {
+mutableRow.setNullAt(i)
+  } else {
+mutableRow(i) = unwrap(element, 
fieldList(i).getFieldObjectInspector)
+  }
+  i += 1
+})
+return mutableRow
+  } catch {
+case e: EOFException =
+  eof = true
+  return null
+  }
+}
+
+override def next(): Row = {
+  if (!hasNext) {
+throw new NoSuchElementException
+  }
+ 
+  if (outputSerde == null) {
+val prevLine = curLine
 curLine = reader.readLine()
+ 
+if (!ioschema.schemaLess) {
+  new GenericRow(
+
prevLine.split(ioschema.outputRowFormatMap(TOK_TABLEROWFORMATFIELD))
+.asInstanceOf[Array[Any]])
+} else {
+  new GenericRow(
+
prevLine.split(ioschema.outputRowFormatMap(TOK_TABLEROWFORMATFIELD), 2)
+.asInstanceOf[Array[Any]])
+}
+  } else {
+val ret = deserialize()
+if (!eof) {
+  cacheRow = null
+  cacheRow = deserialize()
+}
+ret
   }
 }
   }
-  readerThread.start()
+
+  val (inputSerde, inputSoi) = ioschema.initInputSerDe(input)
+  val dataOutputStream = new DataOutputStream(outputStream)
   val outputProjection = new InterpretedProjection(input, child.output)
+
   iter
 .map(outputProjection)
-// TODO: Use SerDe
-.map(_.mkString(, \t, 
\n).getBytes(utf-8)).foreach(outputStream.write)
+.foreach { row =
+  if (inputSerde == null) {
+val data = row.mkString(, 
ioschema.inputRowFormatMap(TOK_TABLEROWFORMATFIELD),
+
ioschema.inputRowFormatMap(TOK_TABLEROWFORMATLINES)).getBytes(utf-8)
+ 
+outputStream.write(data)
+  } else {
+val writable = new ShimWritable(
+  inputSerde.serialize(row.asInstanceOf[GenericRow].values, 
inputSoi))
+writable.write(dataOutputStream)
+  }
+}
   outputStream.close()
-  readerThread.join()
-  outputLines.toIterator
+  iterator
+}
+  }
+}
+
+/**
+ * The wrapper class of Hive input and output schema properties
+ *

[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-02-01 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23904323
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ScriptTransformation.scala
 ---
@@ -25,9 +25,18 @@ import 
org.apache.spark.sql.catalyst.expressions.{Attribute, Expression}
  * @param input the set of expression that should be passed to the script.
  * @param script the command that should be executed.
  * @param output the attributes that are produced by the script.
+ * @param ioschema the input and output schema applied in the execution of 
the script.
  */
 case class ScriptTransformation(
 input: Seq[Expression],
 script: String,
 output: Seq[Attribute],
-child: LogicalPlan) extends UnaryNode
+child: LogicalPlan,
+ioschema: Option[ScriptInputOutputSchema]) extends UnaryNode
+
+/**
+ * The wrapper class of input and output schema properties for 
transforming with script.
+ *
--- End diff --

remove this extra line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-02-01 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23904386
  
--- Diff: 
sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim12.scala ---
@@ -241,8 +241,14 @@ private[hive] object HiveShim {
   Decimal(hdoi.getPrimitiveJavaObject(data).bigDecimalValue())
 }
   }
+
+  implicit def prepareWritable(shimW: ShimWritable): Writable = {
+shimW.writable
+  }
 }
 
+case class ShimWritable(writable: Writable)
--- End diff --

Why do we need `ShimWriteable`?  Couldn't the `prepareWriteable` function 
just take a `Writeable`?  Furthermore, does it break Hive 12 if we just always 
fix avro writeable?  It would be good to minimize the size of the shim when 
possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-02-01 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23904416
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ScriptTransformation.scala
 ---
@@ -25,9 +25,18 @@ import 
org.apache.spark.sql.catalyst.expressions.{Attribute, Expression}
  * @param input the set of expression that should be passed to the script.
  * @param script the command that should be executed.
  * @param output the attributes that are produced by the script.
+ * @param ioschema the input and output schema applied in the execution of 
the script.
  */
 case class ScriptTransformation(
 input: Seq[Expression],
 script: String,
 output: Seq[Attribute],
-child: LogicalPlan) extends UnaryNode
+child: LogicalPlan,
+ioschema: Option[ScriptInputOutputSchema]) extends UnaryNode
--- End diff --

Can this ever be `None`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-02-01 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23904403
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ScriptTransformation.scala
 ---
@@ -25,9 +25,18 @@ import 
org.apache.spark.sql.catalyst.expressions.{Attribute, Expression}
  * @param input the set of expression that should be passed to the script.
  * @param script the command that should be executed.
  * @param output the attributes that are produced by the script.
+ * @param ioschema the input and output schema applied in the execution of 
the script.
  */
 case class ScriptTransformation(
 input: Seq[Expression],
 script: String,
 output: Seq[Attribute],
-child: LogicalPlan) extends UnaryNode
+child: LogicalPlan,
+ioschema: Option[ScriptInputOutputSchema]) extends UnaryNode
+
+/**
+ * The wrapper class of input and output schema properties for 
transforming with script.
--- End diff --

I'd phrase this as `A placeholder for implementation specific input and 
output properties when passing data to a script.  For example, in Hive this 
would specify which SerDes to use`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-02-01 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23910289
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ScriptTransformation.scala
 ---
@@ -25,9 +25,18 @@ import 
org.apache.spark.sql.catalyst.expressions.{Attribute, Expression}
  * @param input the set of expression that should be passed to the script.
  * @param script the command that should be executed.
  * @param output the attributes that are produced by the script.
+ * @param ioschema the input and output schema applied in the execution of 
the script.
  */
 case class ScriptTransformation(
 input: Seq[Expression],
 script: String,
 output: Seq[Attribute],
-child: LogicalPlan) extends UnaryNode
+child: LogicalPlan,
+ioschema: Option[ScriptInputOutputSchema]) extends UnaryNode
--- End diff --

In Hive case, it is not. But I think it may be for other cases?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-02-01 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23910154
  
--- Diff: 
sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim12.scala ---
@@ -241,8 +241,14 @@ private[hive] object HiveShim {
   Decimal(hdoi.getPrimitiveJavaObject(data).bigDecimalValue())
 }
   }
+
+  implicit def prepareWritable(shimW: ShimWritable): Writable = {
+shimW.writable
+  }
 }
 
+case class ShimWritable(writable: Writable)
--- End diff --

If we skip `ShimWriteable`, we then need to remove `implicit` from 
`prepareWriteable` and explicitly call it to do the fixing. Is it better? If 
so, I can do it in this way.

It does not break Hive 12 because we just pass the underlying writable 
object without touching it. We only do the fixing on Hive 13.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-31 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-72315535
  
@rxin I have added the explanation for this feature. Would you have time to 
review this pr and see if it is ok to merge? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-28 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-71835068
  
@chenghao-intel Thanks for comments. I refactor the codes for these issues.

For reusing Hive existing codes, I check the links you mentioned. However, 
the codes parsing input/output format and SerDe are contained in private 
classes. So I think it may not work to reuse them.

To make `logical.ScriptTransformation` parameterized would cause problem in 
Analyzer. So I skip it and move Hive implementation details to another class 
`HiveScriptIOSchema`.

For the special case regarding `AvroGenericRecordWritable`, I move it to 
`HiveShim` so making it work on both Hive 0.12.0 and 0.13.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-71811339
  
  [Test build #26222 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26222/consoleFull)
 for   PR 4014 at commit 
[`ccb71e3`](https://github.com/apache/spark/commit/ccb71e30b8a65fcea5d0d57865bdf5928ef9a534).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-71819468
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26222/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-71819461
  
  [Test build #26222 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26222/consoleFull)
 for   PR 4014 at commit 
[`ccb71e3`](https://github.com/apache/spark/commit/ccb71e30b8a65fcea5d0d57865bdf5928ef9a534).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class HiveScriptIOSchema (`
  * `  val trimed_class = serdeClassName.split(')(1)`
  * `case class ShimWritable(writable: Writable)`
  * `case class ShimWritable(writable: Writable)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-71878029
  
  [Test build #26232 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26232/consoleFull)
 for   PR 4014 at commit 
[`a422562`](https://github.com/apache/spark/commit/a422562088c88a1bb3d3005b7424ac1bddb3e801).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-71947781
  
  [Test build #26268 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26268/consoleFull)
 for   PR 4014 at commit 
[`575f695`](https://github.com/apache/spark/commit/575f69545796900489103c671839aa86c8bd4bc0).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class HiveScriptIOSchema (`
  * `  val trimed_class = serdeClassName.split(')(1)`
  * `case class ShimWritable(writable: Writable)`
  * `case class ShimWritable(writable: Writable)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-71947786
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26268/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-71943344
  
  [Test build #26268 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26268/consoleFull)
 for   PR 4014 at commit 
[`575f695`](https://github.com/apache/spark/commit/575f69545796900489103c671839aa86c8bd4bc0).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-28 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-71942858
  
Upload Hive golden answer files.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-71949295
  
  [Test build #26273 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26273/consoleFull)
 for   PR 4014 at commit 
[`aa10fbd`](https://github.com/apache/spark/commit/aa10fbd0462e46782688b69664920dc11f4b1990).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-28 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-71956530
  
Can you explain in the PR what is schema-less delimiter? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-28 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-71957621
  
[Schema-less Map-reduce 
Scripts](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform#LanguageManualTransform-Schema-lessMap-reduceScripts)
 is a feature of Hive transform syntax. That is there is no `AS` clause after 
`USING my_script`. Hive assumes that the script output contains two columns: 
`key` and `value`. The example SQL looks like:

`SELECT TRANSFORM (key, value) USING 'cat' FROM src`

Custom delimiter is defined by `ROW FORMAT` clause such as:

`SELECT TRANSFORM (key) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\002' 
USING 'cat' AS (tKey) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\002' FROM src`

So you can use field delimiters other than default `\t`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-71956179
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26273/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-71956176
  
  [Test build #26273 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26273/consoleFull)
 for   PR 4014 at commit 
[`aa10fbd`](https://github.com/apache/spark/commit/aa10fbd0462e46782688b69664920dc11f4b1990).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class HiveScriptIOSchema (`
  * `  val trimed_class = serdeClassName.split(')(1)`
  * `case class ShimWritable(writable: Writable)`
  * `case class ShimWritable(writable: Writable)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-28 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-71956296
  
@rxin Would you like to take a look at this too and see if it is ready to 
merge? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-71885779
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26232/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-71885771
  
  [Test build #26232 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26232/consoleFull)
 for   PR 4014 at commit 
[`a422562`](https://github.com/apache/spark/commit/a422562088c88a1bb3d3005b7424ac1bddb3e801).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class HiveScriptIOSchema (`
  * `  val trimed_class = serdeClassName.split(')(1)`
  * `case class ShimWritable(writable: Writable)`
  * `case class ShimWritable(writable: Writable)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-26 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23516387
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
 ---
@@ -53,28 +78,176 @@ case class ScriptTransformation(
   val inputStream = proc.getInputStream
   val outputStream = proc.getOutputStream
   val reader = new BufferedReader(new InputStreamReader(inputStream))
+ 
+  val outputSerde: AbstractSerDe = if (ioschema.outputSerdeClass != 
) {
--- End diff --

I would refactor the codes for the above comments in later comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-26 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23516234
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
 ---
@@ -53,28 +78,176 @@ case class ScriptTransformation(
   val inputStream = proc.getInputStream
   val outputStream = proc.getOutputStream
   val reader = new BufferedReader(new InputStreamReader(inputStream))
+ 
+  val outputSerde: AbstractSerDe = if (ioschema.outputSerdeClass != 
) {
+val trimed_class = ioschema.outputSerdeClass.split(')(1) 
+Utils.classForName(trimed_class)
+  .newInstance.asInstanceOf[AbstractSerDe]
+  } else {
+null
+  }
+ 
+  if (outputSerde != null) {
+val columns = output.map { case aref: AttributeReference = 
aref.name }
+  .mkString(,)
+val columnTypes = output.map { case aref: AttributeReference =
+  aref.dataType.toTypeInfo.getTypeName()
+}.mkString(,)
+
+var propsMap = ioschema.outputSerdeProps.map(kv = {
+  (kv._1.split(')(1), kv._2.split(')(1))
+}).toMap + (serdeConstants.LIST_COLUMNS - columns)
+propsMap = propsMap + (serdeConstants.LIST_COLUMN_TYPES - 
columnTypes)
+
+val properties = new Properties()
+properties.putAll(propsMap)
+ 
+outputSerde.initialize(null, properties)
+  }
+
+  val outputSoi = if (outputSerde != null) {
+
outputSerde.getObjectInspector().asInstanceOf[StructObjectInspector]
+  } else {
+null
+  }
+
+  val iterator: Iterator[Row] = new Iterator[Row] with HiveInspectors {
+var cacheRow: Row = null
+var curLine: String = null
+var eof: Boolean = false
+
+override def hasNext: Boolean = {
+  if (outputSerde == null) {
+if (curLine == null) {
+  curLine = reader.readLine()
+  curLine != null
+} else {
+  true
+}
+  } else {
+!eof
+  }
+}
 
-  // TODO: This should be exposed as an iterator instead of reading in 
all the data at once.
-  val outputLines = collection.mutable.ArrayBuffer[Row]()
-  val readerThread = new Thread(Transform OutputReader) {
-override def run() {
-  var curLine = reader.readLine()
-  while (curLine != null) {
-// TODO: Use SerDe
-outputLines += new 
GenericRow(curLine.split(\t).asInstanceOf[Array[Any]])
+def deserialize(): Row = {
+  if (cacheRow != null) return cacheRow
+
+  val mutableRow = new SpecificMutableRow(output.map(_.dataType))
+  try {
+val dataInputStream = new DataInputStream(inputStream)
+val writable = outputSerde.getSerializedClass().newInstance
+writable.readFields(dataInputStream)
+
+val raw = outputSerde.deserialize(writable)
+val dataList = outputSoi.getStructFieldsDataAsList(raw)
+val fieldList = outputSoi.getAllStructFieldRefs()
+
+var i = 0
+dataList.foreach( element = {
+  if (element == null) {
+mutableRow.setNullAt(i)
+  } else {
+mutableRow(i) = unwrap(element, 
fieldList(i).getFieldObjectInspector)
+  }
+  i += 1
+})
+return mutableRow
+  } catch {
+case e: EOFException =
+  eof = true
+  return null
+  }
+}
+
+override def next(): Row = {
+  if (!hasNext) {
+throw new NoSuchElementException
+  }
+ 
+  if (outputSerde == null) {
+val prevLine = curLine
 curLine = reader.readLine()
+ 
+if (!ioschema.schemaLess) {
+  new GenericRow(
+
prevLine.split(outputRowFormatMap(TOK_TABLEROWFORMATFIELD))
+.asInstanceOf[Array[Any]])
+} else {
+  new GenericRow(
+
prevLine.split(outputRowFormatMap(TOK_TABLEROWFORMATFIELD), 2)
+.asInstanceOf[Array[Any]])
+}
+  } else {
+val ret = deserialize()
+if (!eof) {
+  cacheRow = null
+  cacheRow = deserialize()
+}
+ret
   }
 }
   }
-  readerThread.start()
+
+  val 

[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-25 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23510266
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
 ---
@@ -53,28 +78,176 @@ case class ScriptTransformation(
   val inputStream = proc.getInputStream
   val outputStream = proc.getOutputStream
   val reader = new BufferedReader(new InputStreamReader(inputStream))
+ 
+  val outputSerde: AbstractSerDe = if (ioschema.outputSerdeClass != 
) {
--- End diff --

Make the input / output SerDe in another function(s)? I am not sure if 
there any code can be shared, but seems some duplicated code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-25 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23510387
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
 ---
@@ -327,7 +327,127 @@ class HiveQuerySuite extends HiveComparisonTest with 
BeforeAndAfter {
 
   createQueryTest(transform,
 SELECT TRANSFORM (key) USING 'cat' AS (tKey) FROM src)
+ 
+  test(schema-less transform) {
--- End diff --

As we are targeting the compatibility of Hive, let's make the test case via 
`createQueryTest`, which probably more clear.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-25 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23509873
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ScriptTransformation.scala
 ---
@@ -25,9 +25,24 @@ import 
org.apache.spark.sql.catalyst.expressions.{Attribute, Expression}
  * @param input the set of expression that should be passed to the script.
  * @param script the command that should be executed.
  * @param output the attributes that are produced by the script.
+ * @param ioschema the input and output schema applied in the execution of 
the script.
  */
 case class ScriptTransformation(
 input: Seq[Expression],
 script: String,
 output: Seq[Attribute],
-child: LogicalPlan) extends UnaryNode
+child: LogicalPlan,
+ioschema: ScriptInputOutputSchema) extends UnaryNode
+
+/**
+ * The wrapper class of input and output schema properties for 
transforming with script.
+ *
+ */
+case class ScriptInputOutputSchema(
--- End diff --

Probably move the `ScriptInputOutputSchema` into `hive` package?  The 
`SerDe`, `RowFormat` are the concept of Hive. And the `ScriptTransformation` 
probably can be defined like:

```
case class ScriptTransformation[T](
 input: Seq[Expression],
 script: String,
 output: Seq[Attribute],
 child: LogicalPlan,
 ioschema: Option[T]) extends UnaryNode
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-25 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23510200
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -627,29 +627,71 @@ 
https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C
   case Token(TOK_SELEXPR,
  Token(TOK_TRANSFORM,
Token(TOK_EXPLIST, inputExprs) ::
-   Token(TOK_SERDE, Nil) ::
+   Token(TOK_SERDE, inputSerdeClause) ::
Token(TOK_RECORDWRITER, writerClause) ::
// TODO: Need to support other types of (in/out)put
Token(script, Nil) ::
-   Token(TOK_SERDE, serdeClause) ::
+   Token(TOK_SERDE, outputSerdeClause) ::
Token(TOK_RECORDREADER, readerClause) ::
-   outputClause :: Nil) :: Nil) =
-
-val output = outputClause match {
-  case Token(TOK_ALIASLIST, aliases) =
-aliases.map { case Token(name, Nil) = 
AttributeReference(name, StringType)() }
-  case Token(TOK_TABCOLLIST, attributes) =
-attributes.map { case Token(TOK_TABCOL, Token(name, Nil) 
:: dataType :: Nil) =
-  AttributeReference(name, nodeToDataType(dataType))() }
+   outputClause) :: Nil) =
+
+val (output, schemaLess) = outputClause match {
+  case Token(TOK_ALIASLIST, aliases) :: Nil =
+(aliases.map { case Token(name, Nil) = 
AttributeReference(name, StringType)() },
+  false)
+  case Token(TOK_TABCOLLIST, attributes) :: Nil =
+(attributes.map { case Token(TOK_TABCOL, Token(name, 
Nil) :: dataType :: Nil) =
+  AttributeReference(name, nodeToDataType(dataType))() }, 
false)
+  case Nil =
+(List(AttributeReference(key, StringType)(),
+  AttributeReference(value, StringType)()), true)
 }
+
+val (inputRowFormat, inputSerdeClass, inputSerdeProps) = 
inputSerdeClause match {
--- End diff --

I still have some concerns about the SerDe / RowFormat stuff parsing, 
instead of rewriting them, probably reuse the existed code is preferred, do you 
think the following link will be helpful?

https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L1711

https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L10785



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-25 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23510355
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
 ---
@@ -53,28 +78,176 @@ case class ScriptTransformation(
   val inputStream = proc.getInputStream
   val outputStream = proc.getOutputStream
   val reader = new BufferedReader(new InputStreamReader(inputStream))
+ 
+  val outputSerde: AbstractSerDe = if (ioschema.outputSerdeClass != 
) {
+val trimed_class = ioschema.outputSerdeClass.split(')(1) 
+Utils.classForName(trimed_class)
+  .newInstance.asInstanceOf[AbstractSerDe]
+  } else {
+null
+  }
+ 
+  if (outputSerde != null) {
+val columns = output.map { case aref: AttributeReference = 
aref.name }
+  .mkString(,)
+val columnTypes = output.map { case aref: AttributeReference =
+  aref.dataType.toTypeInfo.getTypeName()
+}.mkString(,)
+
+var propsMap = ioschema.outputSerdeProps.map(kv = {
+  (kv._1.split(')(1), kv._2.split(')(1))
+}).toMap + (serdeConstants.LIST_COLUMNS - columns)
+propsMap = propsMap + (serdeConstants.LIST_COLUMN_TYPES - 
columnTypes)
+
+val properties = new Properties()
+properties.putAll(propsMap)
+ 
+outputSerde.initialize(null, properties)
+  }
+
+  val outputSoi = if (outputSerde != null) {
+
outputSerde.getObjectInspector().asInstanceOf[StructObjectInspector]
+  } else {
+null
+  }
+
+  val iterator: Iterator[Row] = new Iterator[Row] with HiveInspectors {
+var cacheRow: Row = null
+var curLine: String = null
+var eof: Boolean = false
+
+override def hasNext: Boolean = {
+  if (outputSerde == null) {
+if (curLine == null) {
+  curLine = reader.readLine()
+  curLine != null
+} else {
+  true
+}
+  } else {
+!eof
+  }
+}
 
-  // TODO: This should be exposed as an iterator instead of reading in 
all the data at once.
-  val outputLines = collection.mutable.ArrayBuffer[Row]()
-  val readerThread = new Thread(Transform OutputReader) {
-override def run() {
-  var curLine = reader.readLine()
-  while (curLine != null) {
-// TODO: Use SerDe
-outputLines += new 
GenericRow(curLine.split(\t).asInstanceOf[Array[Any]])
+def deserialize(): Row = {
+  if (cacheRow != null) return cacheRow
+
+  val mutableRow = new SpecificMutableRow(output.map(_.dataType))
+  try {
+val dataInputStream = new DataInputStream(inputStream)
+val writable = outputSerde.getSerializedClass().newInstance
+writable.readFields(dataInputStream)
+
+val raw = outputSerde.deserialize(writable)
+val dataList = outputSoi.getStructFieldsDataAsList(raw)
+val fieldList = outputSoi.getAllStructFieldRefs()
+
+var i = 0
+dataList.foreach( element = {
+  if (element == null) {
+mutableRow.setNullAt(i)
+  } else {
+mutableRow(i) = unwrap(element, 
fieldList(i).getFieldObjectInspector)
+  }
+  i += 1
+})
+return mutableRow
+  } catch {
+case e: EOFException =
+  eof = true
+  return null
+  }
+}
+
+override def next(): Row = {
+  if (!hasNext) {
+throw new NoSuchElementException
+  }
+ 
+  if (outputSerde == null) {
+val prevLine = curLine
 curLine = reader.readLine()
+ 
+if (!ioschema.schemaLess) {
+  new GenericRow(
+
prevLine.split(outputRowFormatMap(TOK_TABLEROWFORMATFIELD))
+.asInstanceOf[Array[Any]])
+} else {
+  new GenericRow(
+
prevLine.split(outputRowFormatMap(TOK_TABLEROWFORMATFIELD), 2)
+.asInstanceOf[Array[Any]])
+}
+  } else {
+val ret = deserialize()
+if (!eof) {
+  cacheRow = null
+  cacheRow = deserialize()
+}
+ret
   }
 }
   }
-  readerThread.start()
+
+  val 

[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-25 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-71406585
  
@viirya the `SerDe` and `RowFormat` is quite headache problem, probably 
we'd better keep reusing the Hive code as much as possible, and also keep it 
more independent and generic. I will also investigate how we can reuse the Hive 
Code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-71186284
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26021/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-71186278
  
  [Test build #26021 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26021/consoleFull)
 for   PR 4014 at commit 
[`6000889`](https://github.com/apache/spark/commit/6000889aab6e8efc67cdad227500e0b198e28928).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class ScriptInputOutputSchema(`
  * `val trimed_class = ioschema.outputSerdeClass.split(')(1) `
  * `val trimed_class = ioschema.inputSerdeClass.split(')(1)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23441532
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ---
@@ -165,8 +165,20 @@ private[hive] trait HiveStrategies {
 
   object Scripts extends Strategy {
 def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
-  case logical.ScriptTransformation(input, script, output, child) =
-ScriptTransformation(input, script, output, 
planLater(child))(hiveContext) :: Nil
+  case logical.ScriptTransformation(input, script, output, child,
--- End diff --

Good point. I didn't notice that. New commit will fix it. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-71179395
  
  [Test build #26021 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26021/consoleFull)
 for   PR 4014 at commit 
[`6000889`](https://github.com/apache/spark/commit/6000889aab6e8efc67cdad227500e0b198e28928).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23441619
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
 ---
@@ -327,7 +327,49 @@ class HiveQuerySuite extends HiveComparisonTest with 
BeforeAndAfter {
 
   createQueryTest(transform,
 SELECT TRANSFORM (key) USING 'cat' AS (tKey) FROM src)
+ 
+  test(schema-less transform) {
+val expected = sql(SELECT TRANSFORM (key) USING 'cat' AS (tKey) FROM 
src).collect().head
+val res = sql(SELECT TRANSFORM (key) USING 'cat' FROM 
src).collect().head
+
+assert(expected(0) === res(0))
+
+val expected2 = sql(SELECT TRANSFORM (*) USING 'cat' AS (tKey, 
tValue) FROM src).collect().head
+val res2 = sql(SELECT TRANSFORM (*) USING 'cat' FROM 
src).collect().head
+
+assert(expected2(0) === res2(0)  expected2(1) === res2(1))
+  }
+
+  test(transform with custom field delimiter) {
+val expected = sql(SELECT TRANSFORM (key) ROW FORMAT DELIMITED FIELDS 
TERMINATED BY '\002' USING 'cat' AS (tKey) ROW FORMAT DELIMITED FIELDS 
TERMINATED BY '\002' FROM src).collect().head
--- End diff --

New commit will fix that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-23 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-71179511
  
@chenghao-intel thanks for review. New commits are added for that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-22 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23428628
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -627,21 +627,56 @@ 
https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C
   case Token(TOK_SELEXPR,
  Token(TOK_TRANSFORM,
Token(TOK_EXPLIST, inputExprs) ::
-   Token(TOK_SERDE, Nil) ::
+   Token(TOK_SERDE, inputSerdeClause) ::
Token(TOK_RECORDWRITER, writerClause) ::
// TODO: Need to support other types of (in/out)put
Token(script, Nil) ::
-   Token(TOK_SERDE, serdeClause) ::
+   Token(TOK_SERDE, outputSerdeClause) ::
Token(TOK_RECORDREADER, readerClause) ::
-   outputClause :: Nil) :: Nil) =
+   outputClause) :: Nil) =
 
 val output = outputClause match {
-  case Token(TOK_ALIASLIST, aliases) =
+  case Token(TOK_ALIASLIST, aliases) :: Nil =
 aliases.map { case Token(name, Nil) = 
AttributeReference(name, StringType)() }
-  case Token(TOK_TABCOLLIST, attributes) =
+  case Token(TOK_TABCOLLIST, attributes) :: Nil =
 attributes.map { case Token(TOK_TABCOL, Token(name, Nil) 
:: dataType :: Nil) =
   AttributeReference(name, nodeToDataType(dataType))() }
+  case Nil =
+Nil
 }
+
+val (inputFormat, inputSerdeClass, inputSerdeProps) = 
inputSerdeClause match {
+  case Token(TOK_SERDEPROPS, props) :: Nil =
+(props.map { case Token(name, Token(value, Nil) :: Nil) = 
(name, value) },
+  , Nil)
+  case Token(TOK_SERDENAME, Token(serde, Nil) :: Nil) :: Nil 
= (Nil, serde, Nil)
--- End diff --

According to the manual, they won't appear in the same query.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-22 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23428921
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
 ---
@@ -53,28 +82,176 @@ case class ScriptTransformation(
   val inputStream = proc.getInputStream
   val outputStream = proc.getOutputStream
   val reader = new BufferedReader(new InputStreamReader(inputStream))
+ 
+  val outputSerde: AbstractSerDe = if (outputSerdeClass != ) {
+val trimed_class = outputSerdeClass.split(')(1) 
+Utils.classForName(trimed_class)
+  .newInstance.asInstanceOf[AbstractSerDe]
+  } else {
+null
+  }
+ 
+  if (outputSerde != null) {
+val columns = output.map { case aref: AttributeReference = 
aref.name }
+  .mkString(,)
+val columnTypes = output.map { case aref: AttributeReference =
+  aref.dataType.toTypeInfo.getTypeName()
+}.mkString(,)
+
+var propsMap = outputSerdeProps.map(kv = {
+  (kv._1.split(')(1), kv._2.split(')(1))
+}).toMap + (serdeConstants.LIST_COLUMNS - columns)
+propsMap = propsMap + (serdeConstants.LIST_COLUMN_TYPES - 
columnTypes)
+
+val properties = new Properties()
+properties.putAll(propsMap)
+ 
+outputSerde.initialize(null, properties)
+  }
+
+  val outputSoi = if (outputSerde != null) {
+
outputSerde.getObjectInspector().asInstanceOf[StructObjectInspector]
+  } else {
+null
+  }
+
+  val iterator: Iterator[Row] = new Iterator[Row] with HiveInspectors {
+var cacheRow: Row = null
+var curLine: String = null
+var eof: Boolean = false
+
+override def hasNext: Boolean = {
+  if (outputSerde == null) {
+if (curLine == null) {
+  curLine = reader.readLine()
+  curLine != null
+} else {
+  true
+}
+  } else {
+!eof
+  }
+}
 
-  // TODO: This should be exposed as an iterator instead of reading in 
all the data at once.
-  val outputLines = collection.mutable.ArrayBuffer[Row]()
-  val readerThread = new Thread(Transform OutputReader) {
--- End diff --

If I am wrong please let me know, but I suppose that because I use an 
iterator here so it shouldn't already be the streaming style output and do as 
what the `TODO` wants?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-22 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23429683
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
 ---
@@ -53,28 +82,176 @@ case class ScriptTransformation(
   val inputStream = proc.getInputStream
   val outputStream = proc.getOutputStream
   val reader = new BufferedReader(new InputStreamReader(inputStream))
+ 
+  val outputSerde: AbstractSerDe = if (outputSerdeClass != ) {
+val trimed_class = outputSerdeClass.split(')(1) 
+Utils.classForName(trimed_class)
+  .newInstance.asInstanceOf[AbstractSerDe]
+  } else {
+null
+  }
+ 
+  if (outputSerde != null) {
+val columns = output.map { case aref: AttributeReference = 
aref.name }
+  .mkString(,)
+val columnTypes = output.map { case aref: AttributeReference =
+  aref.dataType.toTypeInfo.getTypeName()
+}.mkString(,)
+
+var propsMap = outputSerdeProps.map(kv = {
+  (kv._1.split(')(1), kv._2.split(')(1))
+}).toMap + (serdeConstants.LIST_COLUMNS - columns)
+propsMap = propsMap + (serdeConstants.LIST_COLUMN_TYPES - 
columnTypes)
+
+val properties = new Properties()
+properties.putAll(propsMap)
+ 
+outputSerde.initialize(null, properties)
+  }
+
+  val outputSoi = if (outputSerde != null) {
+
outputSerde.getObjectInspector().asInstanceOf[StructObjectInspector]
+  } else {
+null
+  }
+
+  val iterator: Iterator[Row] = new Iterator[Row] with HiveInspectors {
+var cacheRow: Row = null
+var curLine: String = null
+var eof: Boolean = false
+
+override def hasNext: Boolean = {
+  if (outputSerde == null) {
+if (curLine == null) {
+  curLine = reader.readLine()
+  curLine != null
+} else {
+  true
+}
+  } else {
+!eof
+  }
+}
 
-  // TODO: This should be exposed as an iterator instead of reading in 
all the data at once.
-  val outputLines = collection.mutable.ArrayBuffer[Row]()
-  val readerThread = new Thread(Transform OutputReader) {
--- End diff --

Oh, got it, that make sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-22 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23424723
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -627,21 +627,56 @@ 
https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C
   case Token(TOK_SELEXPR,
  Token(TOK_TRANSFORM,
Token(TOK_EXPLIST, inputExprs) ::
-   Token(TOK_SERDE, Nil) ::
+   Token(TOK_SERDE, inputSerdeClause) ::
Token(TOK_RECORDWRITER, writerClause) ::
// TODO: Need to support other types of (in/out)put
Token(script, Nil) ::
-   Token(TOK_SERDE, serdeClause) ::
+   Token(TOK_SERDE, outputSerdeClause) ::
Token(TOK_RECORDREADER, readerClause) ::
-   outputClause :: Nil) :: Nil) =
+   outputClause) :: Nil) =
 
 val output = outputClause match {
-  case Token(TOK_ALIASLIST, aliases) =
+  case Token(TOK_ALIASLIST, aliases) :: Nil =
 aliases.map { case Token(name, Nil) = 
AttributeReference(name, StringType)() }
-  case Token(TOK_TABCOLLIST, attributes) =
+  case Token(TOK_TABCOLLIST, attributes) :: Nil =
 attributes.map { case Token(TOK_TABCOL, Token(name, Nil) 
:: dataType :: Nil) =
   AttributeReference(name, nodeToDataType(dataType))() }
+  case Nil =
+Nil
 }
+
+val (inputFormat, inputSerdeClass, inputSerdeProps) = 
inputSerdeClause match {
--- End diff --

Is that more like (`SerDe Properties`, `InputSerDeClass`, `Table 
Properties`)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-22 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23424933
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -627,21 +627,56 @@ 
https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C
   case Token(TOK_SELEXPR,
  Token(TOK_TRANSFORM,
Token(TOK_EXPLIST, inputExprs) ::
-   Token(TOK_SERDE, Nil) ::
+   Token(TOK_SERDE, inputSerdeClause) ::
Token(TOK_RECORDWRITER, writerClause) ::
// TODO: Need to support other types of (in/out)put
Token(script, Nil) ::
-   Token(TOK_SERDE, serdeClause) ::
+   Token(TOK_SERDE, outputSerdeClause) ::
Token(TOK_RECORDREADER, readerClause) ::
-   outputClause :: Nil) :: Nil) =
+   outputClause) :: Nil) =
 
 val output = outputClause match {
-  case Token(TOK_ALIASLIST, aliases) =
+  case Token(TOK_ALIASLIST, aliases) :: Nil =
 aliases.map { case Token(name, Nil) = 
AttributeReference(name, StringType)() }
-  case Token(TOK_TABCOLLIST, attributes) =
+  case Token(TOK_TABCOLLIST, attributes) :: Nil =
 attributes.map { case Token(TOK_TABCOL, Token(name, Nil) 
:: dataType :: Nil) =
   AttributeReference(name, nodeToDataType(dataType))() }
+  case Nil =
+Nil
 }
+
+val (inputFormat, inputSerdeClass, inputSerdeProps) = 
inputSerdeClause match {
--- End diff --

Oh, Sorry, please ignore my previous comment, but `inputFormat` = 
`inputRowFormat` is a better idea?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-22 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23425082
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
 ---
@@ -327,7 +327,49 @@ class HiveQuerySuite extends HiveComparisonTest with 
BeforeAndAfter {
 
   createQueryTest(transform,
 SELECT TRANSFORM (key) USING 'cat' AS (tKey) FROM src)
+ 
+  test(schema-less transform) {
+val expected = sql(SELECT TRANSFORM (key) USING 'cat' AS (tKey) FROM 
src).collect().head
+val res = sql(SELECT TRANSFORM (key) USING 'cat' FROM 
src).collect().head
+
+assert(expected(0) === res(0))
+
+val expected2 = sql(SELECT TRANSFORM (*) USING 'cat' AS (tKey, 
tValue) FROM src).collect().head
+val res2 = sql(SELECT TRANSFORM (*) USING 'cat' FROM 
src).collect().head
+
+assert(expected2(0) === res2(0)  expected2(1) === res2(1))
+  }
+
+  test(transform with custom field delimiter) {
+val expected = sql(SELECT TRANSFORM (key) ROW FORMAT DELIMITED FIELDS 
TERMINATED BY '\002' USING 'cat' AS (tKey) ROW FORMAT DELIMITED FIELDS 
TERMINATED BY '\002' FROM src).collect().head
--- End diff --

Multiple rows instead of one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-22 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23425650
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -627,21 +627,56 @@ 
https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C
   case Token(TOK_SELEXPR,
  Token(TOK_TRANSFORM,
Token(TOK_EXPLIST, inputExprs) ::
-   Token(TOK_SERDE, Nil) ::
+   Token(TOK_SERDE, inputSerdeClause) ::
Token(TOK_RECORDWRITER, writerClause) ::
// TODO: Need to support other types of (in/out)put
Token(script, Nil) ::
-   Token(TOK_SERDE, serdeClause) ::
+   Token(TOK_SERDE, outputSerdeClause) ::
Token(TOK_RECORDREADER, readerClause) ::
-   outputClause :: Nil) :: Nil) =
+   outputClause) :: Nil) =
 
 val output = outputClause match {
-  case Token(TOK_ALIASLIST, aliases) =
+  case Token(TOK_ALIASLIST, aliases) :: Nil =
 aliases.map { case Token(name, Nil) = 
AttributeReference(name, StringType)() }
-  case Token(TOK_TABCOLLIST, attributes) =
+  case Token(TOK_TABCOLLIST, attributes) :: Nil =
 attributes.map { case Token(TOK_TABCOL, Token(name, Nil) 
:: dataType :: Nil) =
   AttributeReference(name, nodeToDataType(dataType))() }
+  case Nil =
+Nil
 }
+
+val (inputFormat, inputSerdeClass, inputSerdeProps) = 
inputSerdeClause match {
+  case Token(TOK_SERDEPROPS, props) :: Nil =
+(props.map { case Token(name, Token(value, Nil) :: Nil) = 
(name, value) },
+  , Nil)
+  case Token(TOK_SERDENAME, Token(serde, Nil) :: Nil) :: Nil 
= (Nil, serde, Nil)
--- End diff --

Is that possible `TOK_SERDEPROPS` and `TOK_SERDENAME` appears in the same 
query? If so, it may causes missed pattern match error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-22 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-71133818
  
@viirya it's great to have the feature, I have some general comments on 
this, let's see how to improve that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-22 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23424312
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ---
@@ -165,8 +165,20 @@ private[hive] trait HiveStrategies {
 
   object Scripts extends Strategy {
 def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
-  case logical.ScriptTransformation(input, script, output, child) =
-ScriptTransformation(input, script, output, 
planLater(child))(hiveContext) :: Nil
+  case logical.ScriptTransformation(input, script, output, child,
--- End diff --

I think a better place to extract the schema (the `output`) is in 
`Analyzer`,  `HiveContext` should be able to create its own rules for that, 
instead of doing this in `Strategy`. Otherwise it probably fails in resolving 
the attributes.

e.g.: 
```
SELECT transform(key + 1, value) USING '/bin/cat' FROM src ORDER BY key, 
value`
```

sorry, I didn't test that, let me know if I am wrong.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-22 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23425029
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -627,21 +627,56 @@ 
https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C
   case Token(TOK_SELEXPR,
  Token(TOK_TRANSFORM,
Token(TOK_EXPLIST, inputExprs) ::
-   Token(TOK_SERDE, Nil) ::
+   Token(TOK_SERDE, inputSerdeClause) ::
Token(TOK_RECORDWRITER, writerClause) ::
// TODO: Need to support other types of (in/out)put
Token(script, Nil) ::
-   Token(TOK_SERDE, serdeClause) ::
+   Token(TOK_SERDE, outputSerdeClause) ::
Token(TOK_RECORDREADER, readerClause) ::
-   outputClause :: Nil) :: Nil) =
+   outputClause) :: Nil) =
 
 val output = outputClause match {
-  case Token(TOK_ALIASLIST, aliases) =
+  case Token(TOK_ALIASLIST, aliases) :: Nil =
 aliases.map { case Token(name, Nil) = 
AttributeReference(name, StringType)() }
-  case Token(TOK_TABCOLLIST, attributes) =
+  case Token(TOK_TABCOLLIST, attributes) :: Nil =
 attributes.map { case Token(TOK_TABCOL, Token(name, Nil) 
:: dataType :: Nil) =
   AttributeReference(name, nodeToDataType(dataType))() }
+  case Nil =
+Nil
 }
+
+val (inputFormat, inputSerdeClass, inputSerdeProps) = 
inputSerdeClause match {
+  case Token(TOK_SERDEPROPS, props) :: Nil =
+(props.map { case Token(name, Token(value, Nil) :: Nil) = 
(name, value) },
--- End diff --

I am a little confused here, why will the `props` be converted into 
`inputFormat`, other than `inputSerdeProps`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-22 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23425907
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -649,7 +684,9 @@ 
https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C
 inputExprs.map(nodeToExpr),
 unescapedScript,
 output,
-withWhere))
+withWhere, inputFormat, outputFormat,
--- End diff --

I would like to put the `inputFormat`, `outputFormat` `inputSerDe` etc. 
into a single object, named `ScriptInputOutputSchema`, and the `SQLContext` and 
`HiveContext` may have different implementatio nof the 
`ScriptInputOutputSchema`. that's how the `Analyzer` of·SQLContext` or 
`HiveContext`  works (to resolve them, e.g. Schema extracting, SerDe class 
existence checking etc.).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-22 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4014#discussion_r23426344
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala
 ---
@@ -53,28 +82,176 @@ case class ScriptTransformation(
   val inputStream = proc.getInputStream
   val outputStream = proc.getOutputStream
   val reader = new BufferedReader(new InputStreamReader(inputStream))
+ 
+  val outputSerde: AbstractSerDe = if (outputSerdeClass != ) {
+val trimed_class = outputSerdeClass.split(')(1) 
+Utils.classForName(trimed_class)
+  .newInstance.asInstanceOf[AbstractSerDe]
+  } else {
+null
+  }
+ 
+  if (outputSerde != null) {
+val columns = output.map { case aref: AttributeReference = 
aref.name }
+  .mkString(,)
+val columnTypes = output.map { case aref: AttributeReference =
+  aref.dataType.toTypeInfo.getTypeName()
+}.mkString(,)
+
+var propsMap = outputSerdeProps.map(kv = {
+  (kv._1.split(')(1), kv._2.split(')(1))
+}).toMap + (serdeConstants.LIST_COLUMNS - columns)
+propsMap = propsMap + (serdeConstants.LIST_COLUMN_TYPES - 
columnTypes)
+
+val properties = new Properties()
+properties.putAll(propsMap)
+ 
+outputSerde.initialize(null, properties)
+  }
+
+  val outputSoi = if (outputSerde != null) {
+
outputSerde.getObjectInspector().asInstanceOf[StructObjectInspector]
+  } else {
+null
+  }
+
+  val iterator: Iterator[Row] = new Iterator[Row] with HiveInspectors {
+var cacheRow: Row = null
+var curLine: String = null
+var eof: Boolean = false
+
+override def hasNext: Boolean = {
+  if (outputSerde == null) {
+if (curLine == null) {
+  curLine = reader.readLine()
+  curLine != null
+} else {
+  true
+}
+  } else {
+!eof
+  }
+}
 
-  // TODO: This should be exposed as an iterator instead of reading in 
all the data at once.
-  val outputLines = collection.mutable.ArrayBuffer[Row]()
-  val readerThread = new Thread(Transform OutputReader) {
--- End diff --

I can understand why removing the reader thread, but it would be helpful in 
the future if we support the `streaming style output`, which will save lots of 
memory, do you might leave it unchanged? or at least keep the `TODO`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70470259
  
  [Test build #25756 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25756/consoleFull)
 for   PR 4014 at commit 
[`7a14f31`](https://github.com/apache/spark/commit/7a14f31e73391b97c723bba949a282a3a3c60329).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `val trimed_class = outputSerdeClass.split(')(1) `
  * `val trimed_class = inputSerdeClass.split(')(1)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70469836
  
  [Test build #25756 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25756/consoleFull)
 for   PR 4014 at commit 
[`7a14f31`](https://github.com/apache/spark/commit/7a14f31e73391b97c723bba949a282a3a3c60329).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70473932
  
  [Test build #25758 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25758/consoleFull)
 for   PR 4014 at commit 
[`9a6dc04`](https://github.com/apache/spark/commit/9a6dc043a94d6c1999810bf230056c64ee66f623).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70481459
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25758/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70481455
  
  [Test build #25758 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25758/consoleFull)
 for   PR 4014 at commit 
[`9a6dc04`](https://github.com/apache/spark/commit/9a6dc043a94d6c1999810bf230056c64ee66f623).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `val trimed_class = outputSerdeClass.split(')(1) `
  * `val trimed_class = inputSerdeClass.split(')(1)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70470261
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25756/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70411540
  
  [Test build #25723 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25723/consoleFull)
 for   PR 4014 at commit 
[`32d3046`](https://github.com/apache/spark/commit/32d3046a228d4bc7f43ed4f20dbd3dee0be42b80).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70411565
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25723/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70411564
  
  [Test build #25723 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25723/consoleFull)
 for   PR 4014 at commit 
[`32d3046`](https://github.com/apache/spark/commit/32d3046a228d4bc7f43ed4f20dbd3dee0be42b80).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `val trimed_class = outputSerdeClass.split(')(1) `
  * `val trimed_class = inputSerdeClass.split(')(1)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70412261
  
  [Test build #25724 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25724/consoleFull)
 for   PR 4014 at commit 
[`be2c3fc`](https://github.com/apache/spark/commit/be2c3fc81aa990b315715dee3f5f387792cb4617).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70421242
  
  [Test build #25729 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25729/consoleFull)
 for   PR 4014 at commit 
[`799b5e1`](https://github.com/apache/spark/commit/799b5e1a5d18a18b7af5e7db950c40f1a393357e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70414870
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25724/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70414868
  
  [Test build #25724 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25724/consoleFull)
 for   PR 4014 at commit 
[`be2c3fc`](https://github.com/apache/spark/commit/be2c3fc81aa990b315715dee3f5f387792cb4617).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `val trimed_class = outputSerdeClass.split(')(1) `
  * `val trimed_class = inputSerdeClass.split(')(1)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70424085
  
  [Test build #25729 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25729/consoleFull)
 for   PR 4014 at commit 
[`799b5e1`](https://github.com/apache/spark/commit/799b5e1a5d18a18b7af5e7db950c40f1a393357e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `val trimed_class = outputSerdeClass.split(')(1) `
  * `val trimed_class = inputSerdeClass.split(')(1)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70424087
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25729/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70370192
  
  [Test build #25703 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25703/consoleFull)
 for   PR 4014 at commit 
[`ab22f7b`](https://github.com/apache/spark/commit/ab22f7b55988ba324e14969c89d8edfe4d663504).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70372569
  
  [Test build #25703 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25703/consoleFull)
 for   PR 4014 at commit 
[`ab22f7b`](https://github.com/apache/spark/commit/ab22f7b55988ba324e14969c89d8edfe4d663504).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70372572
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25703/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70292347
  
  [Test build #25669 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25669/consoleFull)
 for   PR 4014 at commit 
[`5e0b864`](https://github.com/apache/spark/commit/5e0b864e4f055512df63f06580fc45996a0fa3ab).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `val trimed_class = outputSerdeClass.split(')(1) `
  * `val trimed_class = inputSerdeClass.split(')(1)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70292213
  
  [Test build #25669 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25669/consoleFull)
 for   PR 4014 at commit 
[`5e0b864`](https://github.com/apache/spark/commit/5e0b864e4f055512df63f06580fc45996a0fa3ab).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70292349
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25669/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70292959
  
  [Test build #25670 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25670/consoleFull)
 for   PR 4014 at commit 
[`4d21956`](https://github.com/apache/spark/commit/4d21956e75ae2c285a31fd533413c7de2dd990db).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70300945
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25670/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70300932
  
  [Test build #25670 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25670/consoleFull)
 for   PR 4014 at commit 
[`4d21956`](https://github.com/apache/spark/commit/4d21956e75ae2c285a31fd533413c7de2dd990db).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `val trimed_class = outputSerdeClass.split(')(1) `
  * `val trimed_class = inputSerdeClass.split(')(1)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-16 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70304683
  
test again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-16 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70305074
  
Jenkins, ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-16 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70305460
  
Failed by connection error. Please test again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70357294
  
  [Test build #25699 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25699/consoleFull)
 for   PR 4014 at commit 
[`a711657`](https://github.com/apache/spark/commit/a71165771b60b9663319fbff9bf4d4ec049b40dd).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70358722
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25699/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...

2015-01-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4014#issuecomment-70358721
  
  [Test build #25699 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25699/consoleFull)
 for   PR 4014 at commit 
[`a711657`](https://github.com/apache/spark/commit/a71165771b60b9663319fbff9bf4d4ec049b40dd).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `val trimed_class = outputSerdeClass.split(')(1) `
  * `val trimed_class = inputSerdeClass.split(')(1)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >