[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-95879508 @chenghao-intel ok. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-95857854 I just file a jira issue, https://issues.apache.org/jira/browse/SPARK-7119. @viirya can you help on investigate this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4014 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-72546189 Thanks! Merged to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-72464777 [Test build #26516 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26516/consoleFull) for PR 4014 at commit [`ac2d1fe`](https://github.com/apache/spark/commit/ac2d1fe1093f2ad442ff1bd475cec0a38c096e38). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class HiveScriptIOSchema (` * ` val trimed_class = serdeClassName.split(')(1)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-72464789 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26516/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-72455639 [Test build #26516 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26516/consoleFull) for PR 4014 at commit [`ac2d1fe`](https://github.com/apache/spark/commit/ac2d1fe1093f2ad442ff1bd475cec0a38c096e38). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-72465789 @marmbrus I did some refactoring for the comments. It should be better now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23932224 --- Diff: sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim12.scala --- @@ -241,8 +241,14 @@ private[hive] object HiveShim { Decimal(hdoi.getPrimitiveJavaObject(data).bigDecimalValue()) } } + + implicit def prepareWritable(shimW: ShimWritable): Writable = { +shimW.writable + } } +case class ShimWritable(writable: Writable) --- End diff -- Yes, I think implicits in this case just make the code harder to read. On Feb 1, 2015 11:32 PM, Liang-Chi Hsieh notificati...@github.com wrote: In sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim12.scala https://github.com/apache/spark/pull/4014#discussion_r23910154: } +case class ShimWritable(writable: Writable) If we skip ShimWriteable, we then need to remove implicit from prepareWriteable and explicitly call it to do the fixing. Is it better? If so, I can do it in this way. It does not break Hive 12 because we just pass the underlying writable object without touching it. We only do the fixing on Hive 13. â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/4014/files#r23910154. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23932391 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ScriptTransformation.scala --- @@ -25,9 +25,18 @@ import org.apache.spark.sql.catalyst.expressions.{Attribute, Expression} * @param input the set of expression that should be passed to the script. * @param script the command that should be executed. * @param output the attributes that are produced by the script. + * @param ioschema the input and output schema applied in the execution of the script. */ case class ScriptTransformation( input: Seq[Expression], script: String, output: Seq[Attribute], -child: LogicalPlan) extends UnaryNode +child: LogicalPlan, +ioschema: Option[ScriptInputOutputSchema]) extends UnaryNode --- End diff -- What other cases? There is no need to add complexity to the API unless it will be used. In the way you have designed it now, none will result in a confusing error later in execution. Just disallow it statically by not making this an option. On Feb 1, 2015 11:38 PM, Liang-Chi Hsieh notificati...@github.com wrote: In sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ScriptTransformation.scala https://github.com/apache/spark/pull/4014#discussion_r23910289: */ case class ScriptTransformation( input: Seq[Expression], script: String, output: Seq[Attribute], -child: LogicalPlan) extends UnaryNode +child: LogicalPlan, +ioschema: Option[ScriptInputOutputSchema]) extends UnaryNode In Hive case, it is not. But I think it may be for other cases? â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/4014/files#r23910289. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23932681 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ScriptTransformation.scala --- @@ -25,9 +25,18 @@ import org.apache.spark.sql.catalyst.expressions.{Attribute, Expression} * @param input the set of expression that should be passed to the script. * @param script the command that should be executed. * @param output the attributes that are produced by the script. + * @param ioschema the input and output schema applied in the execution of the script. */ case class ScriptTransformation( input: Seq[Expression], script: String, output: Seq[Attribute], -child: LogicalPlan) extends UnaryNode +child: LogicalPlan, +ioschema: Option[ScriptInputOutputSchema]) extends UnaryNode --- End diff -- OK. Modified in latest commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23932592 --- Diff: sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim12.scala --- @@ -241,8 +241,14 @@ private[hive] object HiveShim { Decimal(hdoi.getPrimitiveJavaObject(data).bigDecimalValue()) } } + + implicit def prepareWritable(shimW: ShimWritable): Writable = { +shimW.writable + } } +case class ShimWritable(writable: Writable) --- End diff -- OK. I already skip `ShimWriteable` in latest commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23904350 --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim13.scala --- @@ -395,9 +397,26 @@ private[hive] object HiveShim { Decimal(hdoi.getPrimitiveJavaObject(data).bigDecimalValue(), hdoi.precision(), hdoi.scale()) } } + + implicit def prepareWritable(shimW: ShimWritable): Writable = { +shimW.writable match { + case w: AvroGenericRecordWritable = +w.setRecordReaderID(new UID()) + case _ = +} +shimW.writable + } } /* + * Bug introdiced in hive-0.13. AvroGenericRecordWritable has a member recordReaderID that --- End diff -- introduced --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23904492 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -167,8 +167,10 @@ private[hive] trait HiveStrategies { object Scripts extends Strategy { def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { - case logical.ScriptTransformation(input, script, output, child) = -ScriptTransformation(input, script, output, planLater(child))(hiveContext) :: Nil + case logical.ScriptTransformation(input, script, output, child, schema) = + ScriptTransformation(input, script, output, +planLater(child), schema.map{ case s: HiveScriptIOSchema = s }.get --- End diff -- Given that it is safe to just call `get` here it seems like `Option` might not be the correct choice for the data model. I'd also do the casting by place a type annotation in the match. ```scala case logical.ScriptTransformation(input, script, output, child, schema: HiveScriptIOSchema) = ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-72402698 Thanks for working on this! It would be great if this could be updated soon so we can include it in 1.3. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23904457 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -167,8 +167,10 @@ private[hive] trait HiveStrategies { object Scripts extends Strategy { def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { - case logical.ScriptTransformation(input, script, output, child) = -ScriptTransformation(input, script, output, planLater(child))(hiveContext) :: Nil + case logical.ScriptTransformation(input, script, output, child, schema) = + ScriptTransformation(input, script, output, --- End diff -- Typically once you need to wrap, we wrap all argument on their own line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23904448 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -627,29 +628,71 @@ https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C case Token(TOK_SELEXPR, Token(TOK_TRANSFORM, Token(TOK_EXPLIST, inputExprs) :: - Token(TOK_SERDE, Nil) :: + Token(TOK_SERDE, inputSerdeClause) :: Token(TOK_RECORDWRITER, writerClause) :: // TODO: Need to support other types of (in/out)put Token(script, Nil) :: - Token(TOK_SERDE, serdeClause) :: + Token(TOK_SERDE, outputSerdeClause) :: Token(TOK_RECORDREADER, readerClause) :: - outputClause :: Nil) :: Nil) = - -val output = outputClause match { - case Token(TOK_ALIASLIST, aliases) = -aliases.map { case Token(name, Nil) = AttributeReference(name, StringType)() } - case Token(TOK_TABCOLLIST, attributes) = -attributes.map { case Token(TOK_TABCOL, Token(name, Nil) :: dataType :: Nil) = - AttributeReference(name, nodeToDataType(dataType))() } + outputClause) :: Nil) = + +val (output, schemaLess) = outputClause match { + case Token(TOK_ALIASLIST, aliases) :: Nil = +(aliases.map { case Token(name, Nil) = AttributeReference(name, StringType)() }, + false) + case Token(TOK_TABCOLLIST, attributes) :: Nil = +(attributes.map { case Token(TOK_TABCOL, Token(name, Nil) :: dataType :: Nil) = + AttributeReference(name, nodeToDataType(dataType))() }, false) + case Nil = +(List(AttributeReference(key, StringType)(), + AttributeReference(value, StringType)()), true) } + +val (inputRowFormat, inputSerdeClass, inputSerdeProps) = inputSerdeClause match { + case Token(TOK_SERDEPROPS, props) :: Nil = +(props.map { case Token(name, Token(value, Nil) :: Nil) = (name, value) }, + , Nil) + case Token(TOK_SERDENAME, Token(serde, Nil) :: Nil) :: Nil = (Nil, serde, Nil) + case Token(TOK_SERDENAME, Token(serde, Nil) :: + Token(TOK_TABLEPROPERTIES, + Token(TOK_TABLEPROPLIST, props) :: Nil) :: Nil) :: Nil = +val tableprops = props.map { + case Token(TOK_TABLEPROPERTY, Token(name, Nil) :: Token(value, Nil) :: Nil) = +(name, value) +} +(Nil, serde, tableprops) + case Nil = (Nil, , Nil) +} + +val (outputRowFormat, outputSerdeClass, outputSerdeProps) = outputSerdeClause match { + case Token(TOK_SERDEPROPS, props) :: Nil = +(props.map { case Token(name, Token(value, Nil) :: Nil) = (name, value) }, + , Nil) + case Token(TOK_SERDENAME, Token(serde, Nil) :: Nil) :: Nil = (Nil, serde, Nil) + case Token(TOK_SERDENAME, Token(serde, Nil) :: + Token(TOK_TABLEPROPERTIES, + Token(TOK_TABLEPROPLIST, props) :: Nil) :: Nil) :: Nil = --- End diff -- Indent 2 spaces? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23904432 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -627,29 +628,71 @@ https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C case Token(TOK_SELEXPR, Token(TOK_TRANSFORM, Token(TOK_EXPLIST, inputExprs) :: - Token(TOK_SERDE, Nil) :: + Token(TOK_SERDE, inputSerdeClause) :: Token(TOK_RECORDWRITER, writerClause) :: // TODO: Need to support other types of (in/out)put Token(script, Nil) :: - Token(TOK_SERDE, serdeClause) :: + Token(TOK_SERDE, outputSerdeClause) :: Token(TOK_RECORDREADER, readerClause) :: - outputClause :: Nil) :: Nil) = - -val output = outputClause match { - case Token(TOK_ALIASLIST, aliases) = -aliases.map { case Token(name, Nil) = AttributeReference(name, StringType)() } - case Token(TOK_TABCOLLIST, attributes) = -attributes.map { case Token(TOK_TABCOL, Token(name, Nil) :: dataType :: Nil) = - AttributeReference(name, nodeToDataType(dataType))() } + outputClause) :: Nil) = + +val (output, schemaLess) = outputClause match { + case Token(TOK_ALIASLIST, aliases) :: Nil = +(aliases.map { case Token(name, Nil) = AttributeReference(name, StringType)() }, + false) + case Token(TOK_TABCOLLIST, attributes) :: Nil = +(attributes.map { case Token(TOK_TABCOL, Token(name, Nil) :: dataType :: Nil) = + AttributeReference(name, nodeToDataType(dataType))() }, false) + case Nil = +(List(AttributeReference(key, StringType)(), + AttributeReference(value, StringType)()), true) } + +val (inputRowFormat, inputSerdeClass, inputSerdeProps) = inputSerdeClause match { + case Token(TOK_SERDEPROPS, props) :: Nil = +(props.map { case Token(name, Token(value, Nil) :: Nil) = (name, value) }, --- End diff -- These tuples with maps are a little hard to read. I'd consider using intermediate variables. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23904329 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -33,7 +33,8 @@ import org.apache.spark.sql.catalyst.plans._ import org.apache.spark.sql.catalyst.plans.logical import org.apache.spark.sql.catalyst.plans.logical._ import org.apache.spark.sql.execution.ExplainCommand -import org.apache.spark.sql.hive.execution.{HiveNativeCommand, DropTable, AnalyzeTable} +import org.apache.spark.sql.hive.execution.{HiveNativeCommand, DropTable, AnalyzeTable, + HiveScriptIOSchema} --- End diff -- Don't wrap imports. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23904318 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala --- @@ -53,28 +69,205 @@ case class ScriptTransformation( val inputStream = proc.getInputStream val outputStream = proc.getOutputStream val reader = new BufferedReader(new InputStreamReader(inputStream)) + + val (outputSerde, outputSoi) = ioschema.initOutputSerDe(output) + + val iterator: Iterator[Row] = new Iterator[Row] with HiveInspectors { +var cacheRow: Row = null +var curLine: String = null +var eof: Boolean = false + +override def hasNext: Boolean = { + if (outputSerde == null) { +if (curLine == null) { + curLine = reader.readLine() + curLine != null +} else { + true +} + } else { +!eof + } +} - // TODO: This should be exposed as an iterator instead of reading in all the data at once. - val outputLines = collection.mutable.ArrayBuffer[Row]() - val readerThread = new Thread(Transform OutputReader) { -override def run() { - var curLine = reader.readLine() - while (curLine != null) { -// TODO: Use SerDe -outputLines += new GenericRow(curLine.split(\t).asInstanceOf[Array[Any]]) +def deserialize(): Row = { + if (cacheRow != null) return cacheRow + + val mutableRow = new SpecificMutableRow(output.map(_.dataType)) + try { +val dataInputStream = new DataInputStream(inputStream) +val writable = outputSerde.getSerializedClass().newInstance +writable.readFields(dataInputStream) + +val raw = outputSerde.deserialize(writable) +val dataList = outputSoi.getStructFieldsDataAsList(raw) +val fieldList = outputSoi.getAllStructFieldRefs() + +var i = 0 +dataList.foreach( element = { + if (element == null) { +mutableRow.setNullAt(i) + } else { +mutableRow(i) = unwrap(element, fieldList(i).getFieldObjectInspector) + } + i += 1 +}) +return mutableRow + } catch { +case e: EOFException = + eof = true + return null + } +} + +override def next(): Row = { + if (!hasNext) { +throw new NoSuchElementException + } + + if (outputSerde == null) { +val prevLine = curLine curLine = reader.readLine() + +if (!ioschema.schemaLess) { + new GenericRow( + prevLine.split(ioschema.outputRowFormatMap(TOK_TABLEROWFORMATFIELD)) +.asInstanceOf[Array[Any]]) +} else { + new GenericRow( + prevLine.split(ioschema.outputRowFormatMap(TOK_TABLEROWFORMATFIELD), 2) +.asInstanceOf[Array[Any]]) +} + } else { +val ret = deserialize() +if (!eof) { + cacheRow = null + cacheRow = deserialize() +} +ret } } } - readerThread.start() + + val (inputSerde, inputSoi) = ioschema.initInputSerDe(input) + val dataOutputStream = new DataOutputStream(outputStream) val outputProjection = new InterpretedProjection(input, child.output) + iter .map(outputProjection) -// TODO: Use SerDe -.map(_.mkString(, \t, \n).getBytes(utf-8)).foreach(outputStream.write) +.foreach { row = + if (inputSerde == null) { +val data = row.mkString(, ioschema.inputRowFormatMap(TOK_TABLEROWFORMATFIELD), + ioschema.inputRowFormatMap(TOK_TABLEROWFORMATLINES)).getBytes(utf-8) + +outputStream.write(data) + } else { +val writable = new ShimWritable( + inputSerde.serialize(row.asInstanceOf[GenericRow].values, inputSoi)) +writable.write(dataOutputStream) + } +} outputStream.close() - readerThread.join() - outputLines.toIterator + iterator +} + } +} + +/** + * The wrapper class of Hive input and output schema properties + *
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23904323 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ScriptTransformation.scala --- @@ -25,9 +25,18 @@ import org.apache.spark.sql.catalyst.expressions.{Attribute, Expression} * @param input the set of expression that should be passed to the script. * @param script the command that should be executed. * @param output the attributes that are produced by the script. + * @param ioschema the input and output schema applied in the execution of the script. */ case class ScriptTransformation( input: Seq[Expression], script: String, output: Seq[Attribute], -child: LogicalPlan) extends UnaryNode +child: LogicalPlan, +ioschema: Option[ScriptInputOutputSchema]) extends UnaryNode + +/** + * The wrapper class of input and output schema properties for transforming with script. + * --- End diff -- remove this extra line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23904386 --- Diff: sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim12.scala --- @@ -241,8 +241,14 @@ private[hive] object HiveShim { Decimal(hdoi.getPrimitiveJavaObject(data).bigDecimalValue()) } } + + implicit def prepareWritable(shimW: ShimWritable): Writable = { +shimW.writable + } } +case class ShimWritable(writable: Writable) --- End diff -- Why do we need `ShimWriteable`? Couldn't the `prepareWriteable` function just take a `Writeable`? Furthermore, does it break Hive 12 if we just always fix avro writeable? It would be good to minimize the size of the shim when possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23904416 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ScriptTransformation.scala --- @@ -25,9 +25,18 @@ import org.apache.spark.sql.catalyst.expressions.{Attribute, Expression} * @param input the set of expression that should be passed to the script. * @param script the command that should be executed. * @param output the attributes that are produced by the script. + * @param ioschema the input and output schema applied in the execution of the script. */ case class ScriptTransformation( input: Seq[Expression], script: String, output: Seq[Attribute], -child: LogicalPlan) extends UnaryNode +child: LogicalPlan, +ioschema: Option[ScriptInputOutputSchema]) extends UnaryNode --- End diff -- Can this ever be `None`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23904403 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ScriptTransformation.scala --- @@ -25,9 +25,18 @@ import org.apache.spark.sql.catalyst.expressions.{Attribute, Expression} * @param input the set of expression that should be passed to the script. * @param script the command that should be executed. * @param output the attributes that are produced by the script. + * @param ioschema the input and output schema applied in the execution of the script. */ case class ScriptTransformation( input: Seq[Expression], script: String, output: Seq[Attribute], -child: LogicalPlan) extends UnaryNode +child: LogicalPlan, +ioschema: Option[ScriptInputOutputSchema]) extends UnaryNode + +/** + * The wrapper class of input and output schema properties for transforming with script. --- End diff -- I'd phrase this as `A placeholder for implementation specific input and output properties when passing data to a script. For example, in Hive this would specify which SerDes to use`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23910289 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ScriptTransformation.scala --- @@ -25,9 +25,18 @@ import org.apache.spark.sql.catalyst.expressions.{Attribute, Expression} * @param input the set of expression that should be passed to the script. * @param script the command that should be executed. * @param output the attributes that are produced by the script. + * @param ioschema the input and output schema applied in the execution of the script. */ case class ScriptTransformation( input: Seq[Expression], script: String, output: Seq[Attribute], -child: LogicalPlan) extends UnaryNode +child: LogicalPlan, +ioschema: Option[ScriptInputOutputSchema]) extends UnaryNode --- End diff -- In Hive case, it is not. But I think it may be for other cases? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23910154 --- Diff: sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim12.scala --- @@ -241,8 +241,14 @@ private[hive] object HiveShim { Decimal(hdoi.getPrimitiveJavaObject(data).bigDecimalValue()) } } + + implicit def prepareWritable(shimW: ShimWritable): Writable = { +shimW.writable + } } +case class ShimWritable(writable: Writable) --- End diff -- If we skip `ShimWriteable`, we then need to remove `implicit` from `prepareWriteable` and explicitly call it to do the fixing. Is it better? If so, I can do it in this way. It does not break Hive 12 because we just pass the underlying writable object without touching it. We only do the fixing on Hive 13. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-72315535 @rxin I have added the explanation for this feature. Would you have time to review this pr and see if it is ok to merge? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-71835068 @chenghao-intel Thanks for comments. I refactor the codes for these issues. For reusing Hive existing codes, I check the links you mentioned. However, the codes parsing input/output format and SerDe are contained in private classes. So I think it may not work to reuse them. To make `logical.ScriptTransformation` parameterized would cause problem in Analyzer. So I skip it and move Hive implementation details to another class `HiveScriptIOSchema`. For the special case regarding `AvroGenericRecordWritable`, I move it to `HiveShim` so making it work on both Hive 0.12.0 and 0.13.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-71811339 [Test build #26222 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26222/consoleFull) for PR 4014 at commit [`ccb71e3`](https://github.com/apache/spark/commit/ccb71e30b8a65fcea5d0d57865bdf5928ef9a534). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-71819468 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26222/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-71819461 [Test build #26222 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26222/consoleFull) for PR 4014 at commit [`ccb71e3`](https://github.com/apache/spark/commit/ccb71e30b8a65fcea5d0d57865bdf5928ef9a534). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class HiveScriptIOSchema (` * ` val trimed_class = serdeClassName.split(')(1)` * `case class ShimWritable(writable: Writable)` * `case class ShimWritable(writable: Writable)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-71878029 [Test build #26232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26232/consoleFull) for PR 4014 at commit [`a422562`](https://github.com/apache/spark/commit/a422562088c88a1bb3d3005b7424ac1bddb3e801). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-71947781 [Test build #26268 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26268/consoleFull) for PR 4014 at commit [`575f695`](https://github.com/apache/spark/commit/575f69545796900489103c671839aa86c8bd4bc0). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class HiveScriptIOSchema (` * ` val trimed_class = serdeClassName.split(')(1)` * `case class ShimWritable(writable: Writable)` * `case class ShimWritable(writable: Writable)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-71947786 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26268/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-71943344 [Test build #26268 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26268/consoleFull) for PR 4014 at commit [`575f695`](https://github.com/apache/spark/commit/575f69545796900489103c671839aa86c8bd4bc0). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-71942858 Upload Hive golden answer files. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-71949295 [Test build #26273 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26273/consoleFull) for PR 4014 at commit [`aa10fbd`](https://github.com/apache/spark/commit/aa10fbd0462e46782688b69664920dc11f4b1990). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-71956530 Can you explain in the PR what is schema-less delimiter? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-71957621 [Schema-less Map-reduce Scripts](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform#LanguageManualTransform-Schema-lessMap-reduceScripts) is a feature of Hive transform syntax. That is there is no `AS` clause after `USING my_script`. Hive assumes that the script output contains two columns: `key` and `value`. The example SQL looks like: `SELECT TRANSFORM (key, value) USING 'cat' FROM src` Custom delimiter is defined by `ROW FORMAT` clause such as: `SELECT TRANSFORM (key) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\002' USING 'cat' AS (tKey) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\002' FROM src` So you can use field delimiters other than default `\t`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-71956179 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26273/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-71956176 [Test build #26273 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26273/consoleFull) for PR 4014 at commit [`aa10fbd`](https://github.com/apache/spark/commit/aa10fbd0462e46782688b69664920dc11f4b1990). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class HiveScriptIOSchema (` * ` val trimed_class = serdeClassName.split(')(1)` * `case class ShimWritable(writable: Writable)` * `case class ShimWritable(writable: Writable)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-71956296 @rxin Would you like to take a look at this too and see if it is ready to merge? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-71885779 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26232/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-71885771 [Test build #26232 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26232/consoleFull) for PR 4014 at commit [`a422562`](https://github.com/apache/spark/commit/a422562088c88a1bb3d3005b7424ac1bddb3e801). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class HiveScriptIOSchema (` * ` val trimed_class = serdeClassName.split(')(1)` * `case class ShimWritable(writable: Writable)` * `case class ShimWritable(writable: Writable)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23516387 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala --- @@ -53,28 +78,176 @@ case class ScriptTransformation( val inputStream = proc.getInputStream val outputStream = proc.getOutputStream val reader = new BufferedReader(new InputStreamReader(inputStream)) + + val outputSerde: AbstractSerDe = if (ioschema.outputSerdeClass != ) { --- End diff -- I would refactor the codes for the above comments in later comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23516234 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala --- @@ -53,28 +78,176 @@ case class ScriptTransformation( val inputStream = proc.getInputStream val outputStream = proc.getOutputStream val reader = new BufferedReader(new InputStreamReader(inputStream)) + + val outputSerde: AbstractSerDe = if (ioschema.outputSerdeClass != ) { +val trimed_class = ioschema.outputSerdeClass.split(')(1) +Utils.classForName(trimed_class) + .newInstance.asInstanceOf[AbstractSerDe] + } else { +null + } + + if (outputSerde != null) { +val columns = output.map { case aref: AttributeReference = aref.name } + .mkString(,) +val columnTypes = output.map { case aref: AttributeReference = + aref.dataType.toTypeInfo.getTypeName() +}.mkString(,) + +var propsMap = ioschema.outputSerdeProps.map(kv = { + (kv._1.split(')(1), kv._2.split(')(1)) +}).toMap + (serdeConstants.LIST_COLUMNS - columns) +propsMap = propsMap + (serdeConstants.LIST_COLUMN_TYPES - columnTypes) + +val properties = new Properties() +properties.putAll(propsMap) + +outputSerde.initialize(null, properties) + } + + val outputSoi = if (outputSerde != null) { + outputSerde.getObjectInspector().asInstanceOf[StructObjectInspector] + } else { +null + } + + val iterator: Iterator[Row] = new Iterator[Row] with HiveInspectors { +var cacheRow: Row = null +var curLine: String = null +var eof: Boolean = false + +override def hasNext: Boolean = { + if (outputSerde == null) { +if (curLine == null) { + curLine = reader.readLine() + curLine != null +} else { + true +} + } else { +!eof + } +} - // TODO: This should be exposed as an iterator instead of reading in all the data at once. - val outputLines = collection.mutable.ArrayBuffer[Row]() - val readerThread = new Thread(Transform OutputReader) { -override def run() { - var curLine = reader.readLine() - while (curLine != null) { -// TODO: Use SerDe -outputLines += new GenericRow(curLine.split(\t).asInstanceOf[Array[Any]]) +def deserialize(): Row = { + if (cacheRow != null) return cacheRow + + val mutableRow = new SpecificMutableRow(output.map(_.dataType)) + try { +val dataInputStream = new DataInputStream(inputStream) +val writable = outputSerde.getSerializedClass().newInstance +writable.readFields(dataInputStream) + +val raw = outputSerde.deserialize(writable) +val dataList = outputSoi.getStructFieldsDataAsList(raw) +val fieldList = outputSoi.getAllStructFieldRefs() + +var i = 0 +dataList.foreach( element = { + if (element == null) { +mutableRow.setNullAt(i) + } else { +mutableRow(i) = unwrap(element, fieldList(i).getFieldObjectInspector) + } + i += 1 +}) +return mutableRow + } catch { +case e: EOFException = + eof = true + return null + } +} + +override def next(): Row = { + if (!hasNext) { +throw new NoSuchElementException + } + + if (outputSerde == null) { +val prevLine = curLine curLine = reader.readLine() + +if (!ioschema.schemaLess) { + new GenericRow( + prevLine.split(outputRowFormatMap(TOK_TABLEROWFORMATFIELD)) +.asInstanceOf[Array[Any]]) +} else { + new GenericRow( + prevLine.split(outputRowFormatMap(TOK_TABLEROWFORMATFIELD), 2) +.asInstanceOf[Array[Any]]) +} + } else { +val ret = deserialize() +if (!eof) { + cacheRow = null + cacheRow = deserialize() +} +ret } } } - readerThread.start() + + val
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23510266 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala --- @@ -53,28 +78,176 @@ case class ScriptTransformation( val inputStream = proc.getInputStream val outputStream = proc.getOutputStream val reader = new BufferedReader(new InputStreamReader(inputStream)) + + val outputSerde: AbstractSerDe = if (ioschema.outputSerdeClass != ) { --- End diff -- Make the input / output SerDe in another function(s)? I am not sure if there any code can be shared, but seems some duplicated code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23510387 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala --- @@ -327,7 +327,127 @@ class HiveQuerySuite extends HiveComparisonTest with BeforeAndAfter { createQueryTest(transform, SELECT TRANSFORM (key) USING 'cat' AS (tKey) FROM src) + + test(schema-less transform) { --- End diff -- As we are targeting the compatibility of Hive, let's make the test case via `createQueryTest`, which probably more clear. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23509873 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ScriptTransformation.scala --- @@ -25,9 +25,24 @@ import org.apache.spark.sql.catalyst.expressions.{Attribute, Expression} * @param input the set of expression that should be passed to the script. * @param script the command that should be executed. * @param output the attributes that are produced by the script. + * @param ioschema the input and output schema applied in the execution of the script. */ case class ScriptTransformation( input: Seq[Expression], script: String, output: Seq[Attribute], -child: LogicalPlan) extends UnaryNode +child: LogicalPlan, +ioschema: ScriptInputOutputSchema) extends UnaryNode + +/** + * The wrapper class of input and output schema properties for transforming with script. + * + */ +case class ScriptInputOutputSchema( --- End diff -- Probably move the `ScriptInputOutputSchema` into `hive` package? The `SerDe`, `RowFormat` are the concept of Hive. And the `ScriptTransformation` probably can be defined like: ``` case class ScriptTransformation[T]( input: Seq[Expression], script: String, output: Seq[Attribute], child: LogicalPlan, ioschema: Option[T]) extends UnaryNode ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23510200 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -627,29 +627,71 @@ https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C case Token(TOK_SELEXPR, Token(TOK_TRANSFORM, Token(TOK_EXPLIST, inputExprs) :: - Token(TOK_SERDE, Nil) :: + Token(TOK_SERDE, inputSerdeClause) :: Token(TOK_RECORDWRITER, writerClause) :: // TODO: Need to support other types of (in/out)put Token(script, Nil) :: - Token(TOK_SERDE, serdeClause) :: + Token(TOK_SERDE, outputSerdeClause) :: Token(TOK_RECORDREADER, readerClause) :: - outputClause :: Nil) :: Nil) = - -val output = outputClause match { - case Token(TOK_ALIASLIST, aliases) = -aliases.map { case Token(name, Nil) = AttributeReference(name, StringType)() } - case Token(TOK_TABCOLLIST, attributes) = -attributes.map { case Token(TOK_TABCOL, Token(name, Nil) :: dataType :: Nil) = - AttributeReference(name, nodeToDataType(dataType))() } + outputClause) :: Nil) = + +val (output, schemaLess) = outputClause match { + case Token(TOK_ALIASLIST, aliases) :: Nil = +(aliases.map { case Token(name, Nil) = AttributeReference(name, StringType)() }, + false) + case Token(TOK_TABCOLLIST, attributes) :: Nil = +(attributes.map { case Token(TOK_TABCOL, Token(name, Nil) :: dataType :: Nil) = + AttributeReference(name, nodeToDataType(dataType))() }, false) + case Nil = +(List(AttributeReference(key, StringType)(), + AttributeReference(value, StringType)()), true) } + +val (inputRowFormat, inputSerdeClass, inputSerdeProps) = inputSerdeClause match { --- End diff -- I still have some concerns about the SerDe / RowFormat stuff parsing, instead of rewriting them, probably reuse the existed code is preferred, do you think the following link will be helpful? https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L1711 https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L10785 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23510355 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala --- @@ -53,28 +78,176 @@ case class ScriptTransformation( val inputStream = proc.getInputStream val outputStream = proc.getOutputStream val reader = new BufferedReader(new InputStreamReader(inputStream)) + + val outputSerde: AbstractSerDe = if (ioschema.outputSerdeClass != ) { +val trimed_class = ioschema.outputSerdeClass.split(')(1) +Utils.classForName(trimed_class) + .newInstance.asInstanceOf[AbstractSerDe] + } else { +null + } + + if (outputSerde != null) { +val columns = output.map { case aref: AttributeReference = aref.name } + .mkString(,) +val columnTypes = output.map { case aref: AttributeReference = + aref.dataType.toTypeInfo.getTypeName() +}.mkString(,) + +var propsMap = ioschema.outputSerdeProps.map(kv = { + (kv._1.split(')(1), kv._2.split(')(1)) +}).toMap + (serdeConstants.LIST_COLUMNS - columns) +propsMap = propsMap + (serdeConstants.LIST_COLUMN_TYPES - columnTypes) + +val properties = new Properties() +properties.putAll(propsMap) + +outputSerde.initialize(null, properties) + } + + val outputSoi = if (outputSerde != null) { + outputSerde.getObjectInspector().asInstanceOf[StructObjectInspector] + } else { +null + } + + val iterator: Iterator[Row] = new Iterator[Row] with HiveInspectors { +var cacheRow: Row = null +var curLine: String = null +var eof: Boolean = false + +override def hasNext: Boolean = { + if (outputSerde == null) { +if (curLine == null) { + curLine = reader.readLine() + curLine != null +} else { + true +} + } else { +!eof + } +} - // TODO: This should be exposed as an iterator instead of reading in all the data at once. - val outputLines = collection.mutable.ArrayBuffer[Row]() - val readerThread = new Thread(Transform OutputReader) { -override def run() { - var curLine = reader.readLine() - while (curLine != null) { -// TODO: Use SerDe -outputLines += new GenericRow(curLine.split(\t).asInstanceOf[Array[Any]]) +def deserialize(): Row = { + if (cacheRow != null) return cacheRow + + val mutableRow = new SpecificMutableRow(output.map(_.dataType)) + try { +val dataInputStream = new DataInputStream(inputStream) +val writable = outputSerde.getSerializedClass().newInstance +writable.readFields(dataInputStream) + +val raw = outputSerde.deserialize(writable) +val dataList = outputSoi.getStructFieldsDataAsList(raw) +val fieldList = outputSoi.getAllStructFieldRefs() + +var i = 0 +dataList.foreach( element = { + if (element == null) { +mutableRow.setNullAt(i) + } else { +mutableRow(i) = unwrap(element, fieldList(i).getFieldObjectInspector) + } + i += 1 +}) +return mutableRow + } catch { +case e: EOFException = + eof = true + return null + } +} + +override def next(): Row = { + if (!hasNext) { +throw new NoSuchElementException + } + + if (outputSerde == null) { +val prevLine = curLine curLine = reader.readLine() + +if (!ioschema.schemaLess) { + new GenericRow( + prevLine.split(outputRowFormatMap(TOK_TABLEROWFORMATFIELD)) +.asInstanceOf[Array[Any]]) +} else { + new GenericRow( + prevLine.split(outputRowFormatMap(TOK_TABLEROWFORMATFIELD), 2) +.asInstanceOf[Array[Any]]) +} + } else { +val ret = deserialize() +if (!eof) { + cacheRow = null + cacheRow = deserialize() +} +ret } } } - readerThread.start() + + val
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-71406585 @viirya the `SerDe` and `RowFormat` is quite headache problem, probably we'd better keep reusing the Hive code as much as possible, and also keep it more independent and generic. I will also investigate how we can reuse the Hive Code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-71186284 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26021/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-71186278 [Test build #26021 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26021/consoleFull) for PR 4014 at commit [`6000889`](https://github.com/apache/spark/commit/6000889aab6e8efc67cdad227500e0b198e28928). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ScriptInputOutputSchema(` * `val trimed_class = ioschema.outputSerdeClass.split(')(1) ` * `val trimed_class = ioschema.inputSerdeClass.split(')(1)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23441532 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -165,8 +165,20 @@ private[hive] trait HiveStrategies { object Scripts extends Strategy { def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { - case logical.ScriptTransformation(input, script, output, child) = -ScriptTransformation(input, script, output, planLater(child))(hiveContext) :: Nil + case logical.ScriptTransformation(input, script, output, child, --- End diff -- Good point. I didn't notice that. New commit will fix it. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-71179395 [Test build #26021 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26021/consoleFull) for PR 4014 at commit [`6000889`](https://github.com/apache/spark/commit/6000889aab6e8efc67cdad227500e0b198e28928). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23441619 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala --- @@ -327,7 +327,49 @@ class HiveQuerySuite extends HiveComparisonTest with BeforeAndAfter { createQueryTest(transform, SELECT TRANSFORM (key) USING 'cat' AS (tKey) FROM src) + + test(schema-less transform) { +val expected = sql(SELECT TRANSFORM (key) USING 'cat' AS (tKey) FROM src).collect().head +val res = sql(SELECT TRANSFORM (key) USING 'cat' FROM src).collect().head + +assert(expected(0) === res(0)) + +val expected2 = sql(SELECT TRANSFORM (*) USING 'cat' AS (tKey, tValue) FROM src).collect().head +val res2 = sql(SELECT TRANSFORM (*) USING 'cat' FROM src).collect().head + +assert(expected2(0) === res2(0) expected2(1) === res2(1)) + } + + test(transform with custom field delimiter) { +val expected = sql(SELECT TRANSFORM (key) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\002' USING 'cat' AS (tKey) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\002' FROM src).collect().head --- End diff -- New commit will fix that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-71179511 @chenghao-intel thanks for review. New commits are added for that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23428628 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -627,21 +627,56 @@ https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C case Token(TOK_SELEXPR, Token(TOK_TRANSFORM, Token(TOK_EXPLIST, inputExprs) :: - Token(TOK_SERDE, Nil) :: + Token(TOK_SERDE, inputSerdeClause) :: Token(TOK_RECORDWRITER, writerClause) :: // TODO: Need to support other types of (in/out)put Token(script, Nil) :: - Token(TOK_SERDE, serdeClause) :: + Token(TOK_SERDE, outputSerdeClause) :: Token(TOK_RECORDREADER, readerClause) :: - outputClause :: Nil) :: Nil) = + outputClause) :: Nil) = val output = outputClause match { - case Token(TOK_ALIASLIST, aliases) = + case Token(TOK_ALIASLIST, aliases) :: Nil = aliases.map { case Token(name, Nil) = AttributeReference(name, StringType)() } - case Token(TOK_TABCOLLIST, attributes) = + case Token(TOK_TABCOLLIST, attributes) :: Nil = attributes.map { case Token(TOK_TABCOL, Token(name, Nil) :: dataType :: Nil) = AttributeReference(name, nodeToDataType(dataType))() } + case Nil = +Nil } + +val (inputFormat, inputSerdeClass, inputSerdeProps) = inputSerdeClause match { + case Token(TOK_SERDEPROPS, props) :: Nil = +(props.map { case Token(name, Token(value, Nil) :: Nil) = (name, value) }, + , Nil) + case Token(TOK_SERDENAME, Token(serde, Nil) :: Nil) :: Nil = (Nil, serde, Nil) --- End diff -- According to the manual, they won't appear in the same query. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23428921 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala --- @@ -53,28 +82,176 @@ case class ScriptTransformation( val inputStream = proc.getInputStream val outputStream = proc.getOutputStream val reader = new BufferedReader(new InputStreamReader(inputStream)) + + val outputSerde: AbstractSerDe = if (outputSerdeClass != ) { +val trimed_class = outputSerdeClass.split(')(1) +Utils.classForName(trimed_class) + .newInstance.asInstanceOf[AbstractSerDe] + } else { +null + } + + if (outputSerde != null) { +val columns = output.map { case aref: AttributeReference = aref.name } + .mkString(,) +val columnTypes = output.map { case aref: AttributeReference = + aref.dataType.toTypeInfo.getTypeName() +}.mkString(,) + +var propsMap = outputSerdeProps.map(kv = { + (kv._1.split(')(1), kv._2.split(')(1)) +}).toMap + (serdeConstants.LIST_COLUMNS - columns) +propsMap = propsMap + (serdeConstants.LIST_COLUMN_TYPES - columnTypes) + +val properties = new Properties() +properties.putAll(propsMap) + +outputSerde.initialize(null, properties) + } + + val outputSoi = if (outputSerde != null) { + outputSerde.getObjectInspector().asInstanceOf[StructObjectInspector] + } else { +null + } + + val iterator: Iterator[Row] = new Iterator[Row] with HiveInspectors { +var cacheRow: Row = null +var curLine: String = null +var eof: Boolean = false + +override def hasNext: Boolean = { + if (outputSerde == null) { +if (curLine == null) { + curLine = reader.readLine() + curLine != null +} else { + true +} + } else { +!eof + } +} - // TODO: This should be exposed as an iterator instead of reading in all the data at once. - val outputLines = collection.mutable.ArrayBuffer[Row]() - val readerThread = new Thread(Transform OutputReader) { --- End diff -- If I am wrong please let me know, but I suppose that because I use an iterator here so it shouldn't already be the streaming style output and do as what the `TODO` wants? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23429683 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala --- @@ -53,28 +82,176 @@ case class ScriptTransformation( val inputStream = proc.getInputStream val outputStream = proc.getOutputStream val reader = new BufferedReader(new InputStreamReader(inputStream)) + + val outputSerde: AbstractSerDe = if (outputSerdeClass != ) { +val trimed_class = outputSerdeClass.split(')(1) +Utils.classForName(trimed_class) + .newInstance.asInstanceOf[AbstractSerDe] + } else { +null + } + + if (outputSerde != null) { +val columns = output.map { case aref: AttributeReference = aref.name } + .mkString(,) +val columnTypes = output.map { case aref: AttributeReference = + aref.dataType.toTypeInfo.getTypeName() +}.mkString(,) + +var propsMap = outputSerdeProps.map(kv = { + (kv._1.split(')(1), kv._2.split(')(1)) +}).toMap + (serdeConstants.LIST_COLUMNS - columns) +propsMap = propsMap + (serdeConstants.LIST_COLUMN_TYPES - columnTypes) + +val properties = new Properties() +properties.putAll(propsMap) + +outputSerde.initialize(null, properties) + } + + val outputSoi = if (outputSerde != null) { + outputSerde.getObjectInspector().asInstanceOf[StructObjectInspector] + } else { +null + } + + val iterator: Iterator[Row] = new Iterator[Row] with HiveInspectors { +var cacheRow: Row = null +var curLine: String = null +var eof: Boolean = false + +override def hasNext: Boolean = { + if (outputSerde == null) { +if (curLine == null) { + curLine = reader.readLine() + curLine != null +} else { + true +} + } else { +!eof + } +} - // TODO: This should be exposed as an iterator instead of reading in all the data at once. - val outputLines = collection.mutable.ArrayBuffer[Row]() - val readerThread = new Thread(Transform OutputReader) { --- End diff -- Oh, got it, that make sense. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23424723 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -627,21 +627,56 @@ https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C case Token(TOK_SELEXPR, Token(TOK_TRANSFORM, Token(TOK_EXPLIST, inputExprs) :: - Token(TOK_SERDE, Nil) :: + Token(TOK_SERDE, inputSerdeClause) :: Token(TOK_RECORDWRITER, writerClause) :: // TODO: Need to support other types of (in/out)put Token(script, Nil) :: - Token(TOK_SERDE, serdeClause) :: + Token(TOK_SERDE, outputSerdeClause) :: Token(TOK_RECORDREADER, readerClause) :: - outputClause :: Nil) :: Nil) = + outputClause) :: Nil) = val output = outputClause match { - case Token(TOK_ALIASLIST, aliases) = + case Token(TOK_ALIASLIST, aliases) :: Nil = aliases.map { case Token(name, Nil) = AttributeReference(name, StringType)() } - case Token(TOK_TABCOLLIST, attributes) = + case Token(TOK_TABCOLLIST, attributes) :: Nil = attributes.map { case Token(TOK_TABCOL, Token(name, Nil) :: dataType :: Nil) = AttributeReference(name, nodeToDataType(dataType))() } + case Nil = +Nil } + +val (inputFormat, inputSerdeClass, inputSerdeProps) = inputSerdeClause match { --- End diff -- Is that more like (`SerDe Properties`, `InputSerDeClass`, `Table Properties`)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23424933 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -627,21 +627,56 @@ https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C case Token(TOK_SELEXPR, Token(TOK_TRANSFORM, Token(TOK_EXPLIST, inputExprs) :: - Token(TOK_SERDE, Nil) :: + Token(TOK_SERDE, inputSerdeClause) :: Token(TOK_RECORDWRITER, writerClause) :: // TODO: Need to support other types of (in/out)put Token(script, Nil) :: - Token(TOK_SERDE, serdeClause) :: + Token(TOK_SERDE, outputSerdeClause) :: Token(TOK_RECORDREADER, readerClause) :: - outputClause :: Nil) :: Nil) = + outputClause) :: Nil) = val output = outputClause match { - case Token(TOK_ALIASLIST, aliases) = + case Token(TOK_ALIASLIST, aliases) :: Nil = aliases.map { case Token(name, Nil) = AttributeReference(name, StringType)() } - case Token(TOK_TABCOLLIST, attributes) = + case Token(TOK_TABCOLLIST, attributes) :: Nil = attributes.map { case Token(TOK_TABCOL, Token(name, Nil) :: dataType :: Nil) = AttributeReference(name, nodeToDataType(dataType))() } + case Nil = +Nil } + +val (inputFormat, inputSerdeClass, inputSerdeProps) = inputSerdeClause match { --- End diff -- Oh, Sorry, please ignore my previous comment, but `inputFormat` = `inputRowFormat` is a better idea? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23425082 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala --- @@ -327,7 +327,49 @@ class HiveQuerySuite extends HiveComparisonTest with BeforeAndAfter { createQueryTest(transform, SELECT TRANSFORM (key) USING 'cat' AS (tKey) FROM src) + + test(schema-less transform) { +val expected = sql(SELECT TRANSFORM (key) USING 'cat' AS (tKey) FROM src).collect().head +val res = sql(SELECT TRANSFORM (key) USING 'cat' FROM src).collect().head + +assert(expected(0) === res(0)) + +val expected2 = sql(SELECT TRANSFORM (*) USING 'cat' AS (tKey, tValue) FROM src).collect().head +val res2 = sql(SELECT TRANSFORM (*) USING 'cat' FROM src).collect().head + +assert(expected2(0) === res2(0) expected2(1) === res2(1)) + } + + test(transform with custom field delimiter) { +val expected = sql(SELECT TRANSFORM (key) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\002' USING 'cat' AS (tKey) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\002' FROM src).collect().head --- End diff -- Multiple rows instead of one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23425650 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -627,21 +627,56 @@ https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C case Token(TOK_SELEXPR, Token(TOK_TRANSFORM, Token(TOK_EXPLIST, inputExprs) :: - Token(TOK_SERDE, Nil) :: + Token(TOK_SERDE, inputSerdeClause) :: Token(TOK_RECORDWRITER, writerClause) :: // TODO: Need to support other types of (in/out)put Token(script, Nil) :: - Token(TOK_SERDE, serdeClause) :: + Token(TOK_SERDE, outputSerdeClause) :: Token(TOK_RECORDREADER, readerClause) :: - outputClause :: Nil) :: Nil) = + outputClause) :: Nil) = val output = outputClause match { - case Token(TOK_ALIASLIST, aliases) = + case Token(TOK_ALIASLIST, aliases) :: Nil = aliases.map { case Token(name, Nil) = AttributeReference(name, StringType)() } - case Token(TOK_TABCOLLIST, attributes) = + case Token(TOK_TABCOLLIST, attributes) :: Nil = attributes.map { case Token(TOK_TABCOL, Token(name, Nil) :: dataType :: Nil) = AttributeReference(name, nodeToDataType(dataType))() } + case Nil = +Nil } + +val (inputFormat, inputSerdeClass, inputSerdeProps) = inputSerdeClause match { + case Token(TOK_SERDEPROPS, props) :: Nil = +(props.map { case Token(name, Token(value, Nil) :: Nil) = (name, value) }, + , Nil) + case Token(TOK_SERDENAME, Token(serde, Nil) :: Nil) :: Nil = (Nil, serde, Nil) --- End diff -- Is that possible `TOK_SERDEPROPS` and `TOK_SERDENAME` appears in the same query? If so, it may causes missed pattern match error. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-71133818 @viirya it's great to have the feature, I have some general comments on this, let's see how to improve that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23424312 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -165,8 +165,20 @@ private[hive] trait HiveStrategies { object Scripts extends Strategy { def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { - case logical.ScriptTransformation(input, script, output, child) = -ScriptTransformation(input, script, output, planLater(child))(hiveContext) :: Nil + case logical.ScriptTransformation(input, script, output, child, --- End diff -- I think a better place to extract the schema (the `output`) is in `Analyzer`, `HiveContext` should be able to create its own rules for that, instead of doing this in `Strategy`. Otherwise it probably fails in resolving the attributes. e.g.: ``` SELECT transform(key + 1, value) USING '/bin/cat' FROM src ORDER BY key, value` ``` sorry, I didn't test that, let me know if I am wrong. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23425029 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -627,21 +627,56 @@ https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C case Token(TOK_SELEXPR, Token(TOK_TRANSFORM, Token(TOK_EXPLIST, inputExprs) :: - Token(TOK_SERDE, Nil) :: + Token(TOK_SERDE, inputSerdeClause) :: Token(TOK_RECORDWRITER, writerClause) :: // TODO: Need to support other types of (in/out)put Token(script, Nil) :: - Token(TOK_SERDE, serdeClause) :: + Token(TOK_SERDE, outputSerdeClause) :: Token(TOK_RECORDREADER, readerClause) :: - outputClause :: Nil) :: Nil) = + outputClause) :: Nil) = val output = outputClause match { - case Token(TOK_ALIASLIST, aliases) = + case Token(TOK_ALIASLIST, aliases) :: Nil = aliases.map { case Token(name, Nil) = AttributeReference(name, StringType)() } - case Token(TOK_TABCOLLIST, attributes) = + case Token(TOK_TABCOLLIST, attributes) :: Nil = attributes.map { case Token(TOK_TABCOL, Token(name, Nil) :: dataType :: Nil) = AttributeReference(name, nodeToDataType(dataType))() } + case Nil = +Nil } + +val (inputFormat, inputSerdeClass, inputSerdeProps) = inputSerdeClause match { + case Token(TOK_SERDEPROPS, props) :: Nil = +(props.map { case Token(name, Token(value, Nil) :: Nil) = (name, value) }, --- End diff -- I am a little confused here, why will the `props` be converted into `inputFormat`, other than `inputSerdeProps`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23425907 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -649,7 +684,9 @@ https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C inputExprs.map(nodeToExpr), unescapedScript, output, -withWhere)) +withWhere, inputFormat, outputFormat, --- End diff -- I would like to put the `inputFormat`, `outputFormat` `inputSerDe` etc. into a single object, named `ScriptInputOutputSchema`, and the `SQLContext` and `HiveContext` may have different implementatio nof the `ScriptInputOutputSchema`. that's how the `Analyzer` of·SQLContext` or `HiveContext` works (to resolve them, e.g. Schema extracting, SerDe class existence checking etc.). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4014#discussion_r23426344 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala --- @@ -53,28 +82,176 @@ case class ScriptTransformation( val inputStream = proc.getInputStream val outputStream = proc.getOutputStream val reader = new BufferedReader(new InputStreamReader(inputStream)) + + val outputSerde: AbstractSerDe = if (outputSerdeClass != ) { +val trimed_class = outputSerdeClass.split(')(1) +Utils.classForName(trimed_class) + .newInstance.asInstanceOf[AbstractSerDe] + } else { +null + } + + if (outputSerde != null) { +val columns = output.map { case aref: AttributeReference = aref.name } + .mkString(,) +val columnTypes = output.map { case aref: AttributeReference = + aref.dataType.toTypeInfo.getTypeName() +}.mkString(,) + +var propsMap = outputSerdeProps.map(kv = { + (kv._1.split(')(1), kv._2.split(')(1)) +}).toMap + (serdeConstants.LIST_COLUMNS - columns) +propsMap = propsMap + (serdeConstants.LIST_COLUMN_TYPES - columnTypes) + +val properties = new Properties() +properties.putAll(propsMap) + +outputSerde.initialize(null, properties) + } + + val outputSoi = if (outputSerde != null) { + outputSerde.getObjectInspector().asInstanceOf[StructObjectInspector] + } else { +null + } + + val iterator: Iterator[Row] = new Iterator[Row] with HiveInspectors { +var cacheRow: Row = null +var curLine: String = null +var eof: Boolean = false + +override def hasNext: Boolean = { + if (outputSerde == null) { +if (curLine == null) { + curLine = reader.readLine() + curLine != null +} else { + true +} + } else { +!eof + } +} - // TODO: This should be exposed as an iterator instead of reading in all the data at once. - val outputLines = collection.mutable.ArrayBuffer[Row]() - val readerThread = new Thread(Transform OutputReader) { --- End diff -- I can understand why removing the reader thread, but it would be helpful in the future if we support the `streaming style output`, which will save lots of memory, do you might leave it unchanged? or at least keep the `TODO` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70470259 [Test build #25756 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25756/consoleFull) for PR 4014 at commit [`7a14f31`](https://github.com/apache/spark/commit/7a14f31e73391b97c723bba949a282a3a3c60329). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `val trimed_class = outputSerdeClass.split(')(1) ` * `val trimed_class = inputSerdeClass.split(')(1)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70469836 [Test build #25756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25756/consoleFull) for PR 4014 at commit [`7a14f31`](https://github.com/apache/spark/commit/7a14f31e73391b97c723bba949a282a3a3c60329). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70473932 [Test build #25758 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25758/consoleFull) for PR 4014 at commit [`9a6dc04`](https://github.com/apache/spark/commit/9a6dc043a94d6c1999810bf230056c64ee66f623). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70481459 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25758/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70481455 [Test build #25758 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25758/consoleFull) for PR 4014 at commit [`9a6dc04`](https://github.com/apache/spark/commit/9a6dc043a94d6c1999810bf230056c64ee66f623). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `val trimed_class = outputSerdeClass.split(')(1) ` * `val trimed_class = inputSerdeClass.split(')(1)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70470261 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25756/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70411540 [Test build #25723 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25723/consoleFull) for PR 4014 at commit [`32d3046`](https://github.com/apache/spark/commit/32d3046a228d4bc7f43ed4f20dbd3dee0be42b80). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70411565 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25723/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70411564 [Test build #25723 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25723/consoleFull) for PR 4014 at commit [`32d3046`](https://github.com/apache/spark/commit/32d3046a228d4bc7f43ed4f20dbd3dee0be42b80). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `val trimed_class = outputSerdeClass.split(')(1) ` * `val trimed_class = inputSerdeClass.split(')(1)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70412261 [Test build #25724 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25724/consoleFull) for PR 4014 at commit [`be2c3fc`](https://github.com/apache/spark/commit/be2c3fc81aa990b315715dee3f5f387792cb4617). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70421242 [Test build #25729 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25729/consoleFull) for PR 4014 at commit [`799b5e1`](https://github.com/apache/spark/commit/799b5e1a5d18a18b7af5e7db950c40f1a393357e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70414870 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25724/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70414868 [Test build #25724 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25724/consoleFull) for PR 4014 at commit [`be2c3fc`](https://github.com/apache/spark/commit/be2c3fc81aa990b315715dee3f5f387792cb4617). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `val trimed_class = outputSerdeClass.split(')(1) ` * `val trimed_class = inputSerdeClass.split(')(1)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70424085 [Test build #25729 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25729/consoleFull) for PR 4014 at commit [`799b5e1`](https://github.com/apache/spark/commit/799b5e1a5d18a18b7af5e7db950c40f1a393357e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `val trimed_class = outputSerdeClass.split(')(1) ` * `val trimed_class = inputSerdeClass.split(')(1)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70424087 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25729/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70370192 [Test build #25703 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25703/consoleFull) for PR 4014 at commit [`ab22f7b`](https://github.com/apache/spark/commit/ab22f7b55988ba324e14969c89d8edfe4d663504). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70372569 [Test build #25703 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25703/consoleFull) for PR 4014 at commit [`ab22f7b`](https://github.com/apache/spark/commit/ab22f7b55988ba324e14969c89d8edfe4d663504). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70372572 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25703/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70292347 [Test build #25669 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25669/consoleFull) for PR 4014 at commit [`5e0b864`](https://github.com/apache/spark/commit/5e0b864e4f055512df63f06580fc45996a0fa3ab). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `val trimed_class = outputSerdeClass.split(')(1) ` * `val trimed_class = inputSerdeClass.split(')(1)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70292213 [Test build #25669 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25669/consoleFull) for PR 4014 at commit [`5e0b864`](https://github.com/apache/spark/commit/5e0b864e4f055512df63f06580fc45996a0fa3ab). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70292349 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25669/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70292959 [Test build #25670 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25670/consoleFull) for PR 4014 at commit [`4d21956`](https://github.com/apache/spark/commit/4d21956e75ae2c285a31fd533413c7de2dd990db). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70300945 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25670/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70300932 [Test build #25670 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25670/consoleFull) for PR 4014 at commit [`4d21956`](https://github.com/apache/spark/commit/4d21956e75ae2c285a31fd533413c7de2dd990db). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `val trimed_class = outputSerdeClass.split(')(1) ` * `val trimed_class = inputSerdeClass.split(')(1)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70304683 test again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70305074 Jenkins, ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70305460 Failed by connection error. Please test again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70357294 [Test build #25699 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25699/consoleFull) for PR 4014 at commit [`a711657`](https://github.com/apache/spark/commit/a71165771b60b9663319fbff9bf4d4ec049b40dd). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70358722 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25699/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-70358721 [Test build #25699 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25699/consoleFull) for PR 4014 at commit [`a711657`](https://github.com/apache/spark/commit/a71165771b60b9663319fbff9bf4d4ec049b40dd). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `val trimed_class = outputSerdeClass.split(')(1) ` * `val trimed_class = inputSerdeClass.split(')(1)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org