[GitHub] spark issue #14077: [SPARK-16402] [SQL] JDBC Source: Implement save API of D...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14077 @JustinPihony You know, I do not care which PR is merged eventually. You can try to clean your PR at your best. I will review your PR when it is ready. Thanks for your work! Please continue to submit more PRs for improving Spark. To reduce the code changes in your PR, I think we should not extend `SchemaRelationProvider`. Now, I think you can assume the copy location has been fixed. Since this is related to Data Source APIs, CC @rxin @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14004: [SPARK-16285][SQL] Implement sentences SQL functions
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14004 **[Test build #61895 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61895/consoleFull)** for PR 14004 at commit [`8d7c3d4`](https://github.com/apache/spark/commit/8d7c3d400581332503f7cb831e4d0f850294b608). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14068: enhanced simulate multiply
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/14068 Hi srowen, I have read the "how to contribute" wiki. I thought that it is too small of enhancement to open a jira for it and it passes the tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14004: [SPARK-16285][SQL] Implement sentences SQL functions
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14004 Thank you, @cloud-fan . I updated the PR according to your comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13876: [SPARK-16174][SQL] Improve `OptimizeIn` optimizer to rem...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/13876 Looks pretty good. cc @cloud-fan for another look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13876: [SPARK-16174][SQL] Improve `OptimizeIn` optimizer...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/13876#discussion_r69854463 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -820,16 +820,24 @@ object ConstantFolding extends Rule[LogicalPlan] { } /** - * Replaces [[In (value, seq[Literal])]] with optimized version[[InSet (value, HashSet[Literal])]] - * which is much faster + * Optimize IN predicates: + * 1. Removes literal repetitions. + * 2. Replaces [[In (value, seq[Literal])]] with optimized version + *[[InSet (value, HashSet[Literal])]] which is much faster. */ case class OptimizeIn(conf: CatalystConf) extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case q: LogicalPlan => q transformExpressionsDown { - case In(v, list) if !list.exists(!_.isInstanceOf[Literal]) && - list.size > conf.optimizerInSetConversionThreshold => -val hSet = list.map(e => e.eval(EmptyRow)) -InSet(v, HashSet() ++ hSet) + case expr @ In(v, list) if expr.inSetConvertible => +val newList = ExpressionSet(list).toSeq +if (newList.size > conf.optimizerInSetConversionThreshold) { + val hSet = newList.map(e => e.eval(EmptyRow)) + InSet(v, HashSet() ++ hSet) +} else if (newList.size < list.size) { + expr.copy(value = v, list = newList) --- End diff -- you don't need to copy value here, do you? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69854388 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +654,160 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) override def prettyName: String = "rpad" } +object ParseUrl { + private val HOST = UTF8String.fromString("HOST") + private val PATH = UTF8String.fromString("PATH") + private val QUERY = UTF8String.fromString("QUERY") + private val REF = UTF8String.fromString("REF") + private val PROTOCOL = UTF8String.fromString("PROTOCOL") + private val FILE = UTF8String.fromString("FILE") + private val AUTHORITY = UTF8String.fromString("AUTHORITY") + private val USERINFO = UTF8String.fromString("USERINFO") + private val REGEXPREFIX = "(&|^)" + private val REGEXSUBFIX = "=([^&]*)" +} + +/** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO. +Key specifies which query to extract. +Examples: + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST') + 'spark.apache.org' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY') + 'query=1' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 'query') + '1'""") +case class ParseUrl(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + override def prettyName: String = "parse_url" + + // If the url is a constant, cache the URL object so that we don't need to convert url + // from UTF8String to String to URL for every row. + @transient private lazy val cachedUrl = stringExprs(0) match { +case Literal(url: UTF8String, _) => getUrl(url) +case _ => null + } + + // If the key is a constant, cache the Pattern object so that we don't need to convert key + // from UTF8String to String to StringBuilder to String to Pattern for every row. + @transient private lazy val cachedPattern = stringExprs(2) match { +case Literal(key: UTF8String, _) => getPattern(key) +case _ => null + } + + private lazy val stringExprs = children.toArray + import ParseUrl._ + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.size > 3 || children.size < 2) { + TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two or three arguments") +} else { + super[ImplicitCastInputTypes].checkInputDataTypes() +} + } + + private def getPattern(key: UTF8String): Pattern = { +if (key != null) { + Pattern.compile(REGEXPREFIX + key.toString + REGEXSUBFIX) +} else { + null +} + } + + private def getUrl(url: UTF8String): URL = { +try { + new URL(url.toString) +} catch { + case e: MalformedURLException => null +} + } + + private def extractValueFromQuery(query: UTF8String, pattern: Pattern): UTF8String = { +val m = pattern.matcher(query.toString) +if (m.find()) { + UTF8String.fromString(m.group(2)) +} else { + null +} + } + + private def extractFromUrl(url: URL, partToExtract: UTF8String): UTF8String = { +if (partToExtract.equals(HOST)) { + UTF8String.fromString(url.getHost) +} else if (partToExtract.equals(PATH)) { + UTF8String.fromString(url.getPath) +} else if (partToExtract.equals(QUERY)) { + UTF8String.fromString(url.getQuery) +} else if (partToExtract.equals(REF)) { + UTF8String.fromString(url.getRef) +} else if (partToExtract.equals(PROTOCOL)) { + UTF8String.fromString(url.getProtocol) +} else if (partToExtract.equals(FILE)) { + UTF8String.fromString(url.getFile) +} else if (partToExtract.equals(AUTHORITY)) { + UTF8String.fromString(url.getAuthority) +} else if (partToExtract.equals(USERINFO)) { + UTF8String.fromString(url.getUserInfo) +} else { + null --- End diff -- yea - actually why don't we simplify this function and require part to be foldable rather than per row? It's similar to the other reflect pull request. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your
[GitHub] spark pull request #14004: [SPARK-16285][SQL] Implement sentences SQL functi...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14004#discussion_r69854198 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -198,6 +203,66 @@ case class StringSplit(str: Expression, pattern: Expression) override def prettyName: String = "split" } +/** + * Splits a string into arrays of sentences, where each sentence is an array of words. + * The 'lang' and 'country' arguments are optional, and if omitted, the default locale is used. + */ +@ExpressionDescription( + usage = "_FUNC_(str, lang, country) - Splits str into an array of array of words.", + extended = "> SELECT _FUNC_('Hi there! Good morning.');\n [['Hi','there'], ['Good','morning']]") +case class Sentences( +str: Expression, +language: Expression = Literal(""), +country: Expression = Literal("")) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + def this(str: Expression) = this(str, Literal(""), Literal("")) + def this(str: Expression, language: Expression) = this(str, language, Literal("")) + + override def nullable: Boolean = true + override def dataType: DataType = +ArrayType(ArrayType(StringType, containsNull = false), containsNull = false) + override def inputTypes: Seq[AbstractDataType] = Seq(StringType, StringType, StringType) + override def children: Seq[Expression] = str :: language :: country :: Nil + + override def eval(input: InternalRow): Any = { +val string = str.eval(input) +if (string == null) { + null +} else { + val locale = try { +new Locale(language.eval(input).asInstanceOf[UTF8String].toString, + country.eval(input).asInstanceOf[UTF8String].toString) + } catch { +case _: NullPointerException | _: ClassCastException => Locale.getDefault --- End diff -- It created the wrong locale, then it compared the system available locales. So, finally, ignored. The following is the underlying code in `rt.jar`. ``` List var4 = Control.getControl(Control.FORMAT_DEFAULT).getCandidateLocales("", var1); Iterator var5 = var4.iterator(); while(var5.hasNext()) { Locale var6 = (Locale)var5.next(); if(!var6.equals(var1)) { var2 = findAdapter(var0, var6); if(var2 != null) { ((ConcurrentMap)var3).putIfAbsent(var1, var2); return var2; } } } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14072: [SPARK-16398][CORE] Make cancelJob and cancelStag...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14072 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14084: [SPARK-16021][test-maven] Fix the maven build
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14084 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14075: [SPARK-16401] [SQL] Data Source API: Enable Extending Re...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14075 Sure, let me know whether I need to submit another PR for backporting to 2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13983: [SPARK-16021] Fill freed memory in test to help c...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/13983#discussion_r69854073 --- Diff: common/unsafe/src/test/java/org/apache/spark/unsafe/PlatformUtilSuite.java --- @@ -58,4 +61,17 @@ public void overlappingCopyMemory() { Assert.assertEquals((byte)i, data[i + 1]); } } + + @Test + public void memoryDebugFillEnabledInTest() { +Assert.assertTrue(MemoryAllocator.MEMORY_DEBUG_FILL_ENABLED); --- End diff -- Actually it's fixed by https://github.com/apache/spark/pull/14084 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13983: [SPARK-16021] Fill freed memory in test to help c...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/13983#discussion_r69854030 --- Diff: common/unsafe/src/test/java/org/apache/spark/unsafe/PlatformUtilSuite.java --- @@ -58,4 +61,17 @@ public void overlappingCopyMemory() { Assert.assertEquals((byte)i, data[i + 1]); } } + + @Test + public void memoryDebugFillEnabledInTest() { +Assert.assertTrue(MemoryAllocator.MEMORY_DEBUG_FILL_ENABLED); --- End diff -- Yea I think the problem is that Maven doesn't have the property set. cc @ericl we need to set that in Maven too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14084: [SPARK-16021][test-maven] Fix the maven build
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14084 LGTM - I'm going to merge this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14084: [SPARK-16021][test-maven] Fix the maven build
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/14084 cc @rxin @ericl --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14072: [SPARK-16398][CORE] Make cancelJob and cancelStage APIs ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14072 Merging in master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14075: [SPARK-16401] [SQL] Data Source API: Enable Extending Re...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14075 Thanks - then we should merge this in 2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14077: [SPARK-16402] [SQL] JDBC Source: Implement save API of D...
Github user JustinPihony commented on the issue: https://github.com/apache/spark/pull/14077 Then the best course of action would be to use my current impl as it works no matter the position of copy. I can add the additional tests if that would make it more amenable? Otherwise I'll push a reduced code set in the morning, but it would rely on the copy location move PR > On Jul 7, 2016, at 1:27 AM, Hyukjin Kwonwrote: > > (Personally, I hope this does not get delayed because this usage was introduced in Spark Summit PPT and I guess users would try to use this API.) > > â > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub, or mute the thread. > --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69853706 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +654,145 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) override def prettyName: String = "rpad" } +object ParseUrl { + private val HOST = UTF8String.fromString("HOST") + private val PATH = UTF8String.fromString("PATH") + private val QUERY = UTF8String.fromString("QUERY") + private val REF = UTF8String.fromString("REF") + private val PROTOCOL = UTF8String.fromString("PROTOCOL") + private val FILE = UTF8String.fromString("FILE") + private val AUTHORITY = UTF8String.fromString("AUTHORITY") + private val USERINFO = UTF8String.fromString("USERINFO") + private val REGEXPREFIX = "(&|^)" + private val REGEXSUBFIX = "=([^&]*)" +} + +/** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO. +Key specifies which query to extract. +Examples: + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST') + 'spark.apache.org' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY') + 'query=1' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 'query') + '1'""") +case class ParseUrl(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + override def prettyName: String = "parse_url" + + // If the url is a constant, cache the URL object so that we don't need to convert url + // from UTF8String to String to URL for every row. + @transient private lazy val cachedUrl = children(0) match { +case Literal(url: UTF8String, _) => getUrl(url) --- End diff -- `case Literal(url: UTF8String, _) if url != null`, the `getUrl ` doesn't handle null now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14004: [SPARK-16285][SQL] Implement sentences SQL functi...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14004#discussion_r69853682 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala --- @@ -725,4 +725,41 @@ class StringExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkEvaluation(FindInSet(Literal("abf"), Literal("abc,b,ab,c,def")), 0) checkEvaluation(FindInSet(Literal("ab,"), Literal("abc,b,ab,c,def")), 0) } + + test("Sentences") { +val nullString = Literal.create(null, StringType) +checkEvaluation(Sentences(nullString, nullString, nullString), null, EmptyRow) +checkEvaluation(Sentences(nullString, nullString), null, EmptyRow) +checkEvaluation(Sentences(nullString), null, EmptyRow) +checkEvaluation(Sentences(Literal.create(null, NullType)), null, EmptyRow) +checkEvaluation(Sentences("", nullString, nullString), Seq.empty, EmptyRow) +checkEvaluation(Sentences("", nullString), Seq.empty, EmptyRow) +checkEvaluation(Sentences(""), Seq.empty, EmptyRow) + +val correct_answer = Seq( + Seq("Hi", "there"), + Seq("The", "price", "was"), + Seq("But", "not", "now")) + +// Hive compatible test-cases. +checkEvaluation( + Sentences("Hi there! The price was $1,234.56 But, not now."), + correct_answer, + EmptyRow) --- End diff -- Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69853668 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +654,145 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) override def prettyName: String = "rpad" } +object ParseUrl { + private val HOST = UTF8String.fromString("HOST") + private val PATH = UTF8String.fromString("PATH") + private val QUERY = UTF8String.fromString("QUERY") + private val REF = UTF8String.fromString("REF") + private val PROTOCOL = UTF8String.fromString("PROTOCOL") + private val FILE = UTF8String.fromString("FILE") + private val AUTHORITY = UTF8String.fromString("AUTHORITY") + private val USERINFO = UTF8String.fromString("USERINFO") + private val REGEXPREFIX = "(&|^)" + private val REGEXSUBFIX = "=([^&]*)" +} + +/** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO. +Key specifies which query to extract. +Examples: + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST') + 'spark.apache.org' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY') + 'query=1' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 'query') + '1'""") +case class ParseUrl(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + override def prettyName: String = "parse_url" + + // If the url is a constant, cache the URL object so that we don't need to convert url + // from UTF8String to String to URL for every row. + @transient private lazy val cachedUrl = children(0) match { +case Literal(url: UTF8String, _) => getUrl(url) +case _ => null + } + + // If the key is a constant, cache the Pattern object so that we don't need to convert key + // from UTF8String to String to StringBuilder to String to Pattern for every row. + @transient private lazy val cachedPattern = children(2) match { +case Literal(key: UTF8String, _) => getPattern(key) --- End diff -- `case Literal(url: UTF8String, _) if url != null`, the `getUrl` doesn't handle null now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14084: [SPARK-16021][test-maven] Fix the maven build
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14084 **[Test build #61894 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61894/consoleFull)** for PR 14084 at commit [`a618729`](https://github.com/apache/spark/commit/a6187297351865ea4b6e0a62f4e105a9ef58e876). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13494: [SPARK-15752] [SQL] Optimize metadata only query that ha...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13494 hi @lianhuiwang can you rebase your PR to master? I think it's pretty close! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14004: [SPARK-16285][SQL] Implement sentences SQL functi...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14004#discussion_r69853496 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala --- @@ -347,4 +347,24 @@ class StringFunctionsSuite extends QueryTest with SharedSQLContext { df2.filter("b>0").selectExpr("format_number(a, b)"), Row("5.") :: Row("4.000") :: Row("4.000") :: Row("4.000") :: Row("3.00") :: Nil) } + + test("string sentences function") { +val df = Seq(("Hi there! The price was $1,234.56 But, not now.", "en", "US")) + .toDF("str", "language", "country") + +checkAnswer( + df.selectExpr("sentences(str, language, country)"), + Row(Seq(Seq("Hi", "there"), Seq("The", "price", "was"), Seq("But", "not", "now" + +// Type coercion +checkAnswer( + df.selectExpr("sentences(null)", "sentences(10)", "sentences(3.14)"), + Row(null, Seq(Seq("10")), Seq(Seq("3.14" + +// Argument number exception +val m = intercept[AnalysisException] { + df.selectExpr("sentences()") +}.getMessage +assert(m.contains("Invalid number of arguments")) --- End diff -- It is `Invalid number of arguments for function sentences`. I'll update this, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13494: [SPARK-15752] [SQL] Optimize metadata only query ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13494#discussion_r69853489 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -1689,4 +1689,76 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton { ) } } + + test("spark-15752 optimize metadata only query for hive table") { --- End diff -- why this test is so different from the one in sql core `SQLQuerySuite`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14026: [SPARK-13569][STREAMING][KAFKA] pattern based topic subs...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14026 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14084: [SPARK-16021][test-maven] Fix the maven build
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/14084 [SPARK-16021][test-maven] Fix the maven build ## What changes were proposed in this pull request? Fixed the maven build for #13983 ## How was this patch tested? The existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark fix-maven Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14084.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14084 commit a6187297351865ea4b6e0a62f4e105a9ef58e876 Author: Shixiong ZhuDate: 2016-07-07T05:33:22Z [SPARK-16021] Fix the maven build --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14026: [SPARK-13569][STREAMING][KAFKA] pattern based topic subs...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14026 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61892/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14026: [SPARK-13569][STREAMING][KAFKA] pattern based topic subs...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14026 **[Test build #61892 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61892/consoleFull)** for PR 14026 at commit [`f287722`](https://github.com/apache/spark/commit/f2877226ebe8c70d44f35a10bab056402bc9ffa9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69853099 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala --- @@ -725,4 +725,52 @@ class StringExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkEvaluation(FindInSet(Literal("abf"), Literal("abc,b,ab,c,def")), 0) checkEvaluation(FindInSet(Literal("ab,"), Literal("abc,b,ab,c,def")), 0) } + + test("ParseUrl") { +def checkParseUrl(expected: String, urlStr: String, partToExtract: String): Unit = { + checkEvaluation( +ParseUrl(Seq(Literal(urlStr), Literal(partToExtract))), expected) +} +def checkParseUrlWithKey( +expected: String, +urlStr: String, +partToExtract: String, +key: String): Unit = { + checkEvaluation( +ParseUrl(Seq(Literal(urlStr), Literal(partToExtract), Literal(key))), expected) +} + +checkParseUrl("spark.apache.org", "http://spark.apache.org/path?query=1;, "HOST") +checkParseUrl("/path", "http://spark.apache.org/path?query=1;, "PATH") +checkParseUrl("query=1", "http://spark.apache.org/path?query=1;, "QUERY") +checkParseUrl("Ref", "http://spark.apache.org/path?query=1#Ref;, "REF") +checkParseUrl("http", "http://spark.apache.org/path?query=1;, "PROTOCOL") +checkParseUrl("/path?query=1", "http://spark.apache.org/path?query=1;, "FILE") +checkParseUrl("spark.apache.org:8080", "http://spark.apache.org:8080/path?query=1;, "AUTHORITY") +checkParseUrl("userinfo", "http://useri...@spark.apache.org/path?query=1;, "USERINFO") +checkParseUrlWithKey("1", "http://spark.apache.org/path?query=1;, "QUERY", "query") + +// Null checking +checkParseUrl(null, null, "HOST") +checkParseUrl(null, "http://spark.apache.org/path?query=1;, null) +checkParseUrl(null, null, null) +checkParseUrl(null, "test", "HOST") +checkParseUrl(null, "http://spark.apache.org/path?query=1;, "NO") +checkParseUrl(null, "http://spark.apache.org/path?query=1;, "USERINFO") +checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, "HOST", "query") +checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, "QUERY", "quer") +checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, "QUERY", null) +checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, "QUERY", "") + +// exceptional cases +intercept[java.util.regex.PatternSyntaxException] { + evaluate(ParseUrl(Seq(Literal("http://spark.apache.org/path?;), +Literal("QUERY"), Literal("???" +} + +// arguments checking +assert(ParseUrl(Seq(Literal("1"))).checkInputDataTypes().isFailure) +assert(ParseUrl(Seq(Literal("1"), Literal("2"), Literal("3"), Literal("4"))) --- End diff -- ah right, no need to bother here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69853050 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +654,160 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) override def prettyName: String = "rpad" } +object ParseUrl { + private val HOST = UTF8String.fromString("HOST") + private val PATH = UTF8String.fromString("PATH") + private val QUERY = UTF8String.fromString("QUERY") + private val REF = UTF8String.fromString("REF") + private val PROTOCOL = UTF8String.fromString("PROTOCOL") + private val FILE = UTF8String.fromString("FILE") + private val AUTHORITY = UTF8String.fromString("AUTHORITY") + private val USERINFO = UTF8String.fromString("USERINFO") + private val REGEXPREFIX = "(&|^)" + private val REGEXSUBFIX = "=([^&]*)" +} + +/** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO. +Key specifies which query to extract. +Examples: + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST') + 'spark.apache.org' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY') + 'query=1' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 'query') + '1'""") +case class ParseUrl(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + override def prettyName: String = "parse_url" + + // If the url is a constant, cache the URL object so that we don't need to convert url + // from UTF8String to String to URL for every row. + @transient private lazy val cachedUrl = stringExprs(0) match { +case Literal(url: UTF8String, _) => getUrl(url) +case _ => null + } + + // If the key is a constant, cache the Pattern object so that we don't need to convert key + // from UTF8String to String to StringBuilder to String to Pattern for every row. + @transient private lazy val cachedPattern = stringExprs(2) match { +case Literal(key: UTF8String, _) => getPattern(key) +case _ => null + } + + private lazy val stringExprs = children.toArray + import ParseUrl._ + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.size > 3 || children.size < 2) { + TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two or three arguments") +} else { + super[ImplicitCastInputTypes].checkInputDataTypes() +} + } + + private def getPattern(key: UTF8String): Pattern = { +if (key != null) { + Pattern.compile(REGEXPREFIX + key.toString + REGEXSUBFIX) +} else { + null +} + } + + private def getUrl(url: UTF8String): URL = { +try { + new URL(url.toString) +} catch { + case e: MalformedURLException => null +} + } + + private def extractValueFromQuery(query: UTF8String, pattern: Pattern): UTF8String = { +val m = pattern.matcher(query.toString) +if (m.find()) { + UTF8String.fromString(m.group(2)) +} else { + null +} + } + + private def extractFromUrl(url: URL, partToExtract: UTF8String): UTF8String = { +if (partToExtract.equals(HOST)) { + UTF8String.fromString(url.getHost) +} else if (partToExtract.equals(PATH)) { + UTF8String.fromString(url.getPath) +} else if (partToExtract.equals(QUERY)) { + UTF8String.fromString(url.getQuery) +} else if (partToExtract.equals(REF)) { + UTF8String.fromString(url.getRef) +} else if (partToExtract.equals(PROTOCOL)) { + UTF8String.fromString(url.getProtocol) +} else if (partToExtract.equals(FILE)) { + UTF8String.fromString(url.getFile) +} else if (partToExtract.equals(AUTHORITY)) { + UTF8String.fromString(url.getAuthority) +} else if (partToExtract.equals(USERINFO)) { + UTF8String.fromString(url.getUserInfo) +} else { + null +} + } + + private def parseUrlWithoutKey(url: UTF8String, partToExtract: UTF8String): UTF8String = { +if (url != null && partToExtract != null) { + if (cachedUrl ne null) { +extractFromUrl(cachedUrl, partToExtract) + } else { +
[GitHub] spark issue #14077: [SPARK-16402] [SQL] JDBC Source: Implement save API of D...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14077 (Personally, I hope this does not get delayed because this usage was introduced in Spark Summit PPT and I guess users would try to use this API.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user janplus commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69851574 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala --- @@ -725,4 +725,52 @@ class StringExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkEvaluation(FindInSet(Literal("abf"), Literal("abc,b,ab,c,def")), 0) checkEvaluation(FindInSet(Literal("ab,"), Literal("abc,b,ab,c,def")), 0) } + + test("ParseUrl") { +def checkParseUrl(expected: String, urlStr: String, partToExtract: String): Unit = { + checkEvaluation( +ParseUrl(Seq(Literal(urlStr), Literal(partToExtract))), expected) +} +def checkParseUrlWithKey( +expected: String, +urlStr: String, +partToExtract: String, +key: String): Unit = { + checkEvaluation( +ParseUrl(Seq(Literal(urlStr), Literal(partToExtract), Literal(key))), expected) +} + +checkParseUrl("spark.apache.org", "http://spark.apache.org/path?query=1;, "HOST") +checkParseUrl("/path", "http://spark.apache.org/path?query=1;, "PATH") +checkParseUrl("query=1", "http://spark.apache.org/path?query=1;, "QUERY") +checkParseUrl("Ref", "http://spark.apache.org/path?query=1#Ref;, "REF") +checkParseUrl("http", "http://spark.apache.org/path?query=1;, "PROTOCOL") +checkParseUrl("/path?query=1", "http://spark.apache.org/path?query=1;, "FILE") +checkParseUrl("spark.apache.org:8080", "http://spark.apache.org:8080/path?query=1;, "AUTHORITY") +checkParseUrl("userinfo", "http://useri...@spark.apache.org/path?query=1;, "USERINFO") +checkParseUrlWithKey("1", "http://spark.apache.org/path?query=1;, "QUERY", "query") + +// Null checking +checkParseUrl(null, null, "HOST") +checkParseUrl(null, "http://spark.apache.org/path?query=1;, null) +checkParseUrl(null, null, null) +checkParseUrl(null, "test", "HOST") +checkParseUrl(null, "http://spark.apache.org/path?query=1;, "NO") +checkParseUrl(null, "http://spark.apache.org/path?query=1;, "USERINFO") +checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, "HOST", "query") +checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, "QUERY", "quer") +checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, "QUERY", null) +checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, "QUERY", "") + +// exceptional cases +intercept[java.util.regex.PatternSyntaxException] { + evaluate(ParseUrl(Seq(Literal("http://spark.apache.org/path?;), +Literal("QUERY"), Literal("???" +} + +// arguments checking +assert(ParseUrl(Seq(Literal("1"))).checkInputDataTypes().isFailure) +assert(ParseUrl(Seq(Literal("1"), Literal("2"), Literal("3"), Literal("4"))) --- End diff -- As I declare ParseUrl with `ImplicitCastInputTypes`, I am no sure whether the cases with invalid-type parameters is necessary --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user janplus commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69851306 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +654,160 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) override def prettyName: String = "rpad" } +object ParseUrl { + private val HOST = UTF8String.fromString("HOST") + private val PATH = UTF8String.fromString("PATH") + private val QUERY = UTF8String.fromString("QUERY") + private val REF = UTF8String.fromString("REF") + private val PROTOCOL = UTF8String.fromString("PROTOCOL") + private val FILE = UTF8String.fromString("FILE") + private val AUTHORITY = UTF8String.fromString("AUTHORITY") + private val USERINFO = UTF8String.fromString("USERINFO") + private val REGEXPREFIX = "(&|^)" + private val REGEXSUBFIX = "=([^&]*)" +} + +/** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO. +Key specifies which query to extract. +Examples: + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST') + 'spark.apache.org' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY') + 'query=1' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 'query') + '1'""") +case class ParseUrl(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + override def prettyName: String = "parse_url" + + // If the url is a constant, cache the URL object so that we don't need to convert url + // from UTF8String to String to URL for every row. + @transient private lazy val cachedUrl = stringExprs(0) match { +case Literal(url: UTF8String, _) => getUrl(url) +case _ => null + } + + // If the key is a constant, cache the Pattern object so that we don't need to convert key + // from UTF8String to String to StringBuilder to String to Pattern for every row. + @transient private lazy val cachedPattern = stringExprs(2) match { +case Literal(key: UTF8String, _) => getPattern(key) +case _ => null + } + + private lazy val stringExprs = children.toArray + import ParseUrl._ + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.size > 3 || children.size < 2) { + TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two or three arguments") +} else { + super[ImplicitCastInputTypes].checkInputDataTypes() +} + } + + private def getPattern(key: UTF8String): Pattern = { +if (key != null) { + Pattern.compile(REGEXPREFIX + key.toString + REGEXSUBFIX) +} else { + null +} + } + + private def getUrl(url: UTF8String): URL = { +try { + new URL(url.toString) +} catch { + case e: MalformedURLException => null +} + } + + private def extractValueFromQuery(query: UTF8String, pattern: Pattern): UTF8String = { +val m = pattern.matcher(query.toString) +if (m.find()) { + UTF8String.fromString(m.group(2)) +} else { + null +} + } + + private def extractFromUrl(url: URL, partToExtract: UTF8String): UTF8String = { +if (partToExtract.equals(HOST)) { + UTF8String.fromString(url.getHost) +} else if (partToExtract.equals(PATH)) { + UTF8String.fromString(url.getPath) +} else if (partToExtract.equals(QUERY)) { + UTF8String.fromString(url.getQuery) +} else if (partToExtract.equals(REF)) { + UTF8String.fromString(url.getRef) +} else if (partToExtract.equals(PROTOCOL)) { + UTF8String.fromString(url.getProtocol) +} else if (partToExtract.equals(FILE)) { + UTF8String.fromString(url.getFile) +} else if (partToExtract.equals(AUTHORITY)) { + UTF8String.fromString(url.getAuthority) +} else if (partToExtract.equals(USERINFO)) { + UTF8String.fromString(url.getUserInfo) +} else { + null +} + } + + private def parseUrlWithoutKey(url: UTF8String, partToExtract: UTF8String): UTF8String = { +if (url != null && partToExtract != null) { + if (cachedUrl ne null) { +extractFromUrl(cachedUrl, partToExtract) + } else { +val
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user janplus commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69851163 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +654,160 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) override def prettyName: String = "rpad" } +object ParseUrl { + private val HOST = UTF8String.fromString("HOST") + private val PATH = UTF8String.fromString("PATH") + private val QUERY = UTF8String.fromString("QUERY") + private val REF = UTF8String.fromString("REF") + private val PROTOCOL = UTF8String.fromString("PROTOCOL") + private val FILE = UTF8String.fromString("FILE") + private val AUTHORITY = UTF8String.fromString("AUTHORITY") + private val USERINFO = UTF8String.fromString("USERINFO") + private val REGEXPREFIX = "(&|^)" + private val REGEXSUBFIX = "=([^&]*)" +} + +/** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO. +Key specifies which query to extract. +Examples: + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST') + 'spark.apache.org' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY') + 'query=1' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 'query') + '1'""") +case class ParseUrl(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + override def prettyName: String = "parse_url" + + // If the url is a constant, cache the URL object so that we don't need to convert url + // from UTF8String to String to URL for every row. + @transient private lazy val cachedUrl = stringExprs(0) match { +case Literal(url: UTF8String, _) => getUrl(url) +case _ => null + } + + // If the key is a constant, cache the Pattern object so that we don't need to convert key + // from UTF8String to String to StringBuilder to String to Pattern for every row. + @transient private lazy val cachedPattern = stringExprs(2) match { +case Literal(key: UTF8String, _) => getPattern(key) +case _ => null + } + + private lazy val stringExprs = children.toArray --- End diff -- OK.. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69851130 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala --- @@ -725,4 +725,52 @@ class StringExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkEvaluation(FindInSet(Literal("abf"), Literal("abc,b,ab,c,def")), 0) checkEvaluation(FindInSet(Literal("ab,"), Literal("abc,b,ab,c,def")), 0) } + + test("ParseUrl") { +def checkParseUrl(expected: String, urlStr: String, partToExtract: String): Unit = { + checkEvaluation( +ParseUrl(Seq(Literal(urlStr), Literal(partToExtract))), expected) +} +def checkParseUrlWithKey( +expected: String, +urlStr: String, +partToExtract: String, +key: String): Unit = { + checkEvaluation( +ParseUrl(Seq(Literal(urlStr), Literal(partToExtract), Literal(key))), expected) +} + +checkParseUrl("spark.apache.org", "http://spark.apache.org/path?query=1;, "HOST") +checkParseUrl("/path", "http://spark.apache.org/path?query=1;, "PATH") +checkParseUrl("query=1", "http://spark.apache.org/path?query=1;, "QUERY") +checkParseUrl("Ref", "http://spark.apache.org/path?query=1#Ref;, "REF") +checkParseUrl("http", "http://spark.apache.org/path?query=1;, "PROTOCOL") +checkParseUrl("/path?query=1", "http://spark.apache.org/path?query=1;, "FILE") +checkParseUrl("spark.apache.org:8080", "http://spark.apache.org:8080/path?query=1;, "AUTHORITY") +checkParseUrl("userinfo", "http://useri...@spark.apache.org/path?query=1;, "USERINFO") +checkParseUrlWithKey("1", "http://spark.apache.org/path?query=1;, "QUERY", "query") + +// Null checking +checkParseUrl(null, null, "HOST") +checkParseUrl(null, "http://spark.apache.org/path?query=1;, null) +checkParseUrl(null, null, null) +checkParseUrl(null, "test", "HOST") +checkParseUrl(null, "http://spark.apache.org/path?query=1;, "NO") +checkParseUrl(null, "http://spark.apache.org/path?query=1;, "USERINFO") +checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, "HOST", "query") +checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, "QUERY", "quer") +checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, "QUERY", null) +checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, "QUERY", "") + +// exceptional cases +intercept[java.util.regex.PatternSyntaxException] { + evaluate(ParseUrl(Seq(Literal("http://spark.apache.org/path?;), +Literal("QUERY"), Literal("???" +} + +// arguments checking +assert(ParseUrl(Seq(Literal("1"))).checkInputDataTypes().isFailure) +assert(ParseUrl(Seq(Literal("1"), Literal("2"), Literal("3"), Literal("4"))) --- End diff -- also add some cases with invalid-type parameters? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user janplus commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69851065 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +654,160 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) override def prettyName: String = "rpad" } +object ParseUrl { + private val HOST = UTF8String.fromString("HOST") + private val PATH = UTF8String.fromString("PATH") + private val QUERY = UTF8String.fromString("QUERY") + private val REF = UTF8String.fromString("REF") + private val PROTOCOL = UTF8String.fromString("PROTOCOL") + private val FILE = UTF8String.fromString("FILE") + private val AUTHORITY = UTF8String.fromString("AUTHORITY") + private val USERINFO = UTF8String.fromString("USERINFO") + private val REGEXPREFIX = "(&|^)" + private val REGEXSUBFIX = "=([^&]*)" +} + +/** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO. +Key specifies which query to extract. +Examples: + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST') + 'spark.apache.org' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY') + 'query=1' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 'query') + '1'""") +case class ParseUrl(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + override def prettyName: String = "parse_url" + + // If the url is a constant, cache the URL object so that we don't need to convert url + // from UTF8String to String to URL for every row. + @transient private lazy val cachedUrl = stringExprs(0) match { +case Literal(url: UTF8String, _) => getUrl(url) +case _ => null + } + + // If the key is a constant, cache the Pattern object so that we don't need to convert key + // from UTF8String to String to StringBuilder to String to Pattern for every row. + @transient private lazy val cachedPattern = stringExprs(2) match { +case Literal(key: UTF8String, _) => getPattern(key) +case _ => null + } + + private lazy val stringExprs = children.toArray + import ParseUrl._ + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.size > 3 || children.size < 2) { + TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two or three arguments") +} else { + super[ImplicitCastInputTypes].checkInputDataTypes() +} + } + + private def getPattern(key: UTF8String): Pattern = { +if (key != null) { + Pattern.compile(REGEXPREFIX + key.toString + REGEXSUBFIX) +} else { + null +} + } + + private def getUrl(url: UTF8String): URL = { +try { + new URL(url.toString) +} catch { + case e: MalformedURLException => null +} + } + + private def extractValueFromQuery(query: UTF8String, pattern: Pattern): UTF8String = { +val m = pattern.matcher(query.toString) +if (m.find()) { + UTF8String.fromString(m.group(2)) +} else { + null +} + } + + private def extractFromUrl(url: URL, partToExtract: UTF8String): UTF8String = { +if (partToExtract.equals(HOST)) { + UTF8String.fromString(url.getHost) +} else if (partToExtract.equals(PATH)) { + UTF8String.fromString(url.getPath) +} else if (partToExtract.equals(QUERY)) { + UTF8String.fromString(url.getQuery) +} else if (partToExtract.equals(REF)) { + UTF8String.fromString(url.getRef) +} else if (partToExtract.equals(PROTOCOL)) { + UTF8String.fromString(url.getProtocol) +} else if (partToExtract.equals(FILE)) { + UTF8String.fromString(url.getFile) +} else if (partToExtract.equals(AUTHORITY)) { + UTF8String.fromString(url.getAuthority) +} else if (partToExtract.equals(USERINFO)) { + UTF8String.fromString(url.getUserInfo) +} else { + null +} + } + + private def parseUrlWithoutKey(url: UTF8String, partToExtract: UTF8String): UTF8String = { +if (url != null && partToExtract != null) { + if (cachedUrl ne null) { +extractFromUrl(cachedUrl, partToExtract) + } else { +val
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69851040 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +654,160 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) override def prettyName: String = "rpad" } +object ParseUrl { + private val HOST = UTF8String.fromString("HOST") + private val PATH = UTF8String.fromString("PATH") + private val QUERY = UTF8String.fromString("QUERY") + private val REF = UTF8String.fromString("REF") + private val PROTOCOL = UTF8String.fromString("PROTOCOL") + private val FILE = UTF8String.fromString("FILE") + private val AUTHORITY = UTF8String.fromString("AUTHORITY") + private val USERINFO = UTF8String.fromString("USERINFO") + private val REGEXPREFIX = "(&|^)" + private val REGEXSUBFIX = "=([^&]*)" +} + +/** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO. +Key specifies which query to extract. +Examples: + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST') + 'spark.apache.org' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY') + 'query=1' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 'query') + '1'""") +case class ParseUrl(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + override def prettyName: String = "parse_url" + + // If the url is a constant, cache the URL object so that we don't need to convert url + // from UTF8String to String to URL for every row. + @transient private lazy val cachedUrl = stringExprs(0) match { +case Literal(url: UTF8String, _) => getUrl(url) +case _ => null + } + + // If the key is a constant, cache the Pattern object so that we don't need to convert key + // from UTF8String to String to StringBuilder to String to Pattern for every row. + @transient private lazy val cachedPattern = stringExprs(2) match { +case Literal(key: UTF8String, _) => getPattern(key) +case _ => null + } + + private lazy val stringExprs = children.toArray + import ParseUrl._ + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.size > 3 || children.size < 2) { + TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two or three arguments") +} else { + super[ImplicitCastInputTypes].checkInputDataTypes() +} + } + + private def getPattern(key: UTF8String): Pattern = { +if (key != null) { + Pattern.compile(REGEXPREFIX + key.toString + REGEXSUBFIX) +} else { + null +} + } + + private def getUrl(url: UTF8String): URL = { +try { + new URL(url.toString) +} catch { + case e: MalformedURLException => null +} + } + + private def extractValueFromQuery(query: UTF8String, pattern: Pattern): UTF8String = { +val m = pattern.matcher(query.toString) +if (m.find()) { + UTF8String.fromString(m.group(2)) +} else { + null +} + } + + private def extractFromUrl(url: URL, partToExtract: UTF8String): UTF8String = { +if (partToExtract.equals(HOST)) { + UTF8String.fromString(url.getHost) +} else if (partToExtract.equals(PATH)) { + UTF8String.fromString(url.getPath) +} else if (partToExtract.equals(QUERY)) { + UTF8String.fromString(url.getQuery) +} else if (partToExtract.equals(REF)) { + UTF8String.fromString(url.getRef) +} else if (partToExtract.equals(PROTOCOL)) { + UTF8String.fromString(url.getProtocol) +} else if (partToExtract.equals(FILE)) { + UTF8String.fromString(url.getFile) +} else if (partToExtract.equals(AUTHORITY)) { + UTF8String.fromString(url.getAuthority) +} else if (partToExtract.equals(USERINFO)) { + UTF8String.fromString(url.getUserInfo) +} else { + null +} + } + + private def parseUrlWithoutKey(url: UTF8String, partToExtract: UTF8String): UTF8String = { +if (url != null && partToExtract != null) { + if (cachedUrl ne null) { +extractFromUrl(cachedUrl, partToExtract) + } else { +
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69850990 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +654,160 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) override def prettyName: String = "rpad" } +object ParseUrl { + private val HOST = UTF8String.fromString("HOST") + private val PATH = UTF8String.fromString("PATH") + private val QUERY = UTF8String.fromString("QUERY") + private val REF = UTF8String.fromString("REF") + private val PROTOCOL = UTF8String.fromString("PROTOCOL") + private val FILE = UTF8String.fromString("FILE") + private val AUTHORITY = UTF8String.fromString("AUTHORITY") + private val USERINFO = UTF8String.fromString("USERINFO") + private val REGEXPREFIX = "(&|^)" + private val REGEXSUBFIX = "=([^&]*)" +} + +/** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO. +Key specifies which query to extract. +Examples: + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST') + 'spark.apache.org' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY') + 'query=1' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 'query') + '1'""") +case class ParseUrl(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + override def prettyName: String = "parse_url" + + // If the url is a constant, cache the URL object so that we don't need to convert url + // from UTF8String to String to URL for every row. + @transient private lazy val cachedUrl = stringExprs(0) match { +case Literal(url: UTF8String, _) => getUrl(url) +case _ => null + } + + // If the key is a constant, cache the Pattern object so that we don't need to convert key + // from UTF8String to String to StringBuilder to String to Pattern for every row. + @transient private lazy val cachedPattern = stringExprs(2) match { +case Literal(key: UTF8String, _) => getPattern(key) +case _ => null + } + + private lazy val stringExprs = children.toArray --- End diff -- I don't it's necessary... we only have 2 or 3 parameters... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14077: [SPARK-16402] [SQL] JDBC Source: Implement save API of D...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14077 Thank you for confirming that it is a bug in another PR. Regarding the solution of this PR, it is not a true circular reference. The solution in this PR is to minimize the duplicate codes. I also think it make senses to move the common code logics from `jdbc` API to `createRelation` implementation of `CreatableRelationProvider`. The JDBC-specific logics should not be exposed to the `DataFrameWriter` APIs. If you wants to do it in your PR, I am also fine. Please minimize the code changes and add the test cases introduced in this PR. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69850899 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +654,160 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) override def prettyName: String = "rpad" } +object ParseUrl { + private val HOST = UTF8String.fromString("HOST") + private val PATH = UTF8String.fromString("PATH") + private val QUERY = UTF8String.fromString("QUERY") + private val REF = UTF8String.fromString("REF") + private val PROTOCOL = UTF8String.fromString("PROTOCOL") + private val FILE = UTF8String.fromString("FILE") + private val AUTHORITY = UTF8String.fromString("AUTHORITY") + private val USERINFO = UTF8String.fromString("USERINFO") + private val REGEXPREFIX = "(&|^)" + private val REGEXSUBFIX = "=([^&]*)" +} + +/** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO. +Key specifies which query to extract. +Examples: + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST') + 'spark.apache.org' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY') + 'query=1' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 'query') + '1'""") +case class ParseUrl(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + override def prettyName: String = "parse_url" + + // If the url is a constant, cache the URL object so that we don't need to convert url + // from UTF8String to String to URL for every row. + @transient private lazy val cachedUrl = stringExprs(0) match { +case Literal(url: UTF8String, _) => getUrl(url) +case _ => null + } + + // If the key is a constant, cache the Pattern object so that we don't need to convert key + // from UTF8String to String to StringBuilder to String to Pattern for every row. + @transient private lazy val cachedPattern = stringExprs(2) match { +case Literal(key: UTF8String, _) => getPattern(key) +case _ => null + } + + private lazy val stringExprs = children.toArray + import ParseUrl._ + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.size > 3 || children.size < 2) { + TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two or three arguments") +} else { + super[ImplicitCastInputTypes].checkInputDataTypes() +} + } + + private def getPattern(key: UTF8String): Pattern = { +if (key != null) { + Pattern.compile(REGEXPREFIX + key.toString + REGEXSUBFIX) +} else { + null +} + } + + private def getUrl(url: UTF8String): URL = { +try { + new URL(url.toString) +} catch { + case e: MalformedURLException => null +} + } + + private def extractValueFromQuery(query: UTF8String, pattern: Pattern): UTF8String = { +val m = pattern.matcher(query.toString) +if (m.find()) { + UTF8String.fromString(m.group(2)) +} else { + null +} + } + + private def extractFromUrl(url: URL, partToExtract: UTF8String): UTF8String = { +if (partToExtract.equals(HOST)) { + UTF8String.fromString(url.getHost) +} else if (partToExtract.equals(PATH)) { + UTF8String.fromString(url.getPath) +} else if (partToExtract.equals(QUERY)) { + UTF8String.fromString(url.getQuery) +} else if (partToExtract.equals(REF)) { + UTF8String.fromString(url.getRef) +} else if (partToExtract.equals(PROTOCOL)) { + UTF8String.fromString(url.getProtocol) +} else if (partToExtract.equals(FILE)) { + UTF8String.fromString(url.getFile) +} else if (partToExtract.equals(AUTHORITY)) { + UTF8String.fromString(url.getAuthority) +} else if (partToExtract.equals(USERINFO)) { + UTF8String.fromString(url.getUserInfo) +} else { + null +} + } + + private def parseUrlWithoutKey(url: UTF8String, partToExtract: UTF8String): UTF8String = { +if (url != null && partToExtract != null) { + if (cachedUrl ne null) { +extractFromUrl(cachedUrl, partToExtract) + } else { +
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69850820 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +654,160 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) override def prettyName: String = "rpad" } +object ParseUrl { + private val HOST = UTF8String.fromString("HOST") + private val PATH = UTF8String.fromString("PATH") + private val QUERY = UTF8String.fromString("QUERY") + private val REF = UTF8String.fromString("REF") + private val PROTOCOL = UTF8String.fromString("PROTOCOL") + private val FILE = UTF8String.fromString("FILE") + private val AUTHORITY = UTF8String.fromString("AUTHORITY") + private val USERINFO = UTF8String.fromString("USERINFO") + private val REGEXPREFIX = "(&|^)" + private val REGEXSUBFIX = "=([^&]*)" +} + +/** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO. +Key specifies which query to extract. +Examples: + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST') + 'spark.apache.org' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY') + 'query=1' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 'query') + '1'""") +case class ParseUrl(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + override def prettyName: String = "parse_url" + + // If the url is a constant, cache the URL object so that we don't need to convert url + // from UTF8String to String to URL for every row. + @transient private lazy val cachedUrl = stringExprs(0) match { +case Literal(url: UTF8String, _) => getUrl(url) +case _ => null + } + + // If the key is a constant, cache the Pattern object so that we don't need to convert key + // from UTF8String to String to StringBuilder to String to Pattern for every row. + @transient private lazy val cachedPattern = stringExprs(2) match { +case Literal(key: UTF8String, _) => getPattern(key) +case _ => null + } + + private lazy val stringExprs = children.toArray + import ParseUrl._ + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.size > 3 || children.size < 2) { + TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two or three arguments") +} else { + super[ImplicitCastInputTypes].checkInputDataTypes() +} + } + + private def getPattern(key: UTF8String): Pattern = { +if (key != null) { + Pattern.compile(REGEXPREFIX + key.toString + REGEXSUBFIX) +} else { + null +} + } + + private def getUrl(url: UTF8String): URL = { +try { + new URL(url.toString) +} catch { + case e: MalformedURLException => null +} + } + + private def extractValueFromQuery(query: UTF8String, pattern: Pattern): UTF8String = { +val m = pattern.matcher(query.toString) +if (m.find()) { + UTF8String.fromString(m.group(2)) +} else { + null +} + } + + private def extractFromUrl(url: URL, partToExtract: UTF8String): UTF8String = { +if (partToExtract.equals(HOST)) { + UTF8String.fromString(url.getHost) +} else if (partToExtract.equals(PATH)) { + UTF8String.fromString(url.getPath) +} else if (partToExtract.equals(QUERY)) { + UTF8String.fromString(url.getQuery) +} else if (partToExtract.equals(REF)) { + UTF8String.fromString(url.getRef) +} else if (partToExtract.equals(PROTOCOL)) { + UTF8String.fromString(url.getProtocol) +} else if (partToExtract.equals(FILE)) { + UTF8String.fromString(url.getFile) +} else if (partToExtract.equals(AUTHORITY)) { + UTF8String.fromString(url.getAuthority) +} else if (partToExtract.equals(USERINFO)) { + UTF8String.fromString(url.getUserInfo) +} else { + null --- End diff -- It's weird that we return null if users provide an invalid `part` string, while we have a fixed list of supported `part` string. Even it's hive's rule, it still looks unreasonable to me. cc @rxin @yhuai --- If your project is set up for it, you can reply to this
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69850568 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/UnsafeArrayDataBenchmark.scala --- @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import scala.util.Random + +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder +import org.apache.spark.sql.catalyst.expressions.{UnsafeArrayData, UnsafeRow} +import org.apache.spark.sql.catalyst.expressions.codegen.{BufferHolder, UnsafeArrayWriter} +import org.apache.spark.util.Benchmark + +/** + * Benchmark [[UnsafeArrayDataBenchmark]] for UnsafeArrayData + * To run this: + * 1. replace ignore(...) with test(...) + * 2. build/sbt "sql/test-only *benchmark.UnsafeArrayDataBenchmark" + * + * Benchmarks in this file are skipped in normal builds. + */ +class UnsafeArrayDataBenchmark extends BenchmarkBase { + + def calculateHeaderPortionInBytes(count: Int) : Int = { +// Use this assignment for SPARK-15962 +// val size = 4 + 4 * count +val size = UnsafeArrayData.calculateHeaderPortionInBytes(count) +size + } + + def readUnsafeArray(iters: Int): Unit = { +val count = 1024 * 1024 * 16 +val rand = new Random(42) + +var intResult: Int = 0 +val intBuffer = Array.fill[Int](count) { rand.nextInt } +val intEncoder = ExpressionEncoder[Array[Int]].resolveAndBind() +val intInternalRow = intEncoder.toRow(intBuffer) +val intUnsafeArray = intInternalRow.getArray(0) +val readIntArray = { i: Int => + var n = 0 + while (n < iters) { +val len = intUnsafeArray.numElements +var sum = 0.toInt +var i = 0 +while (i < len) { + sum += intUnsafeArray.getInt(i) + i += 1 +} +intResult = sum +n += 1 + } +} + +var doubleResult: Double = 0 +val doubleBuffer = Array.fill[Double](count) { rand.nextDouble } +val doubleEncoder = ExpressionEncoder[Array[Double]].resolveAndBind() +val doubleInternalRow = doubleEncoder.toRow(doubleBuffer) +val doubleUnsafeArray = doubleInternalRow.getArray(0) +val readDoubleArray = { i: Int => + var n = 0 + while (n < iters) { +val len = doubleUnsafeArray.numElements +var sum = 0.toDouble +var i = 0 +while (i < len) { + sum += doubleUnsafeArray.getDouble(i) + i += 1 +} +doubleResult = sum +n += 1 + } +} + +val benchmark = new Benchmark("Read UnsafeArrayData", count * iters) +benchmark.addCase("Int")(readIntArray) +benchmark.addCase("Double")(readDoubleArray) +benchmark.run +/* +Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.10.4 +Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz + +Read UnsafeArrayData:Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative + +Int279 / 294600.4 1.7 1.0X +Double 296 / 303567.0 1.8 0.9X +*/ + } + + def writeUnsafeArray(iters: Int): Unit = { +val count = 1024 * 1024 * 16 + +val intUnsafeRow = new UnsafeRow(1) +val intUnsafeArrayWriter = new UnsafeArrayWriter +val intBufferHolder = new BufferHolder(intUnsafeRow, 64) +intBufferHolder.reset() +intUnsafeArrayWriter.initialize(intBufferHolder, count, 4) +val intCursor = intBufferHolder.cursor +val writeIntArray = { i: Int => + var n = 0 + while (n <
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69850526 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/UnsafeArrayDataBenchmark.scala --- @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import scala.util.Random + +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder +import org.apache.spark.sql.catalyst.expressions.{UnsafeArrayData, UnsafeRow} +import org.apache.spark.sql.catalyst.expressions.codegen.{BufferHolder, UnsafeArrayWriter} +import org.apache.spark.util.Benchmark + +/** + * Benchmark [[UnsafeArrayDataBenchmark]] for UnsafeArrayData + * To run this: + * 1. replace ignore(...) with test(...) + * 2. build/sbt "sql/test-only *benchmark.UnsafeArrayDataBenchmark" + * + * Benchmarks in this file are skipped in normal builds. + */ +class UnsafeArrayDataBenchmark extends BenchmarkBase { + + def calculateHeaderPortionInBytes(count: Int) : Int = { +// Use this assignment for SPARK-15962 +// val size = 4 + 4 * count +val size = UnsafeArrayData.calculateHeaderPortionInBytes(count) +size + } + + def readUnsafeArray(iters: Int): Unit = { +val count = 1024 * 1024 * 16 +val rand = new Random(42) + +var intResult: Int = 0 +val intBuffer = Array.fill[Int](count) { rand.nextInt } +val intEncoder = ExpressionEncoder[Array[Int]].resolveAndBind() +val intInternalRow = intEncoder.toRow(intBuffer) +val intUnsafeArray = intInternalRow.getArray(0) +val readIntArray = { i: Int => + var n = 0 + while (n < iters) { +val len = intUnsafeArray.numElements +var sum = 0.toInt +var i = 0 +while (i < len) { + sum += intUnsafeArray.getInt(i) + i += 1 +} +intResult = sum +n += 1 + } +} + +var doubleResult: Double = 0 +val doubleBuffer = Array.fill[Double](count) { rand.nextDouble } +val doubleEncoder = ExpressionEncoder[Array[Double]].resolveAndBind() +val doubleInternalRow = doubleEncoder.toRow(doubleBuffer) +val doubleUnsafeArray = doubleInternalRow.getArray(0) +val readDoubleArray = { i: Int => + var n = 0 + while (n < iters) { +val len = doubleUnsafeArray.numElements +var sum = 0.toDouble +var i = 0 +while (i < len) { + sum += doubleUnsafeArray.getDouble(i) + i += 1 +} +doubleResult = sum +n += 1 + } +} + +val benchmark = new Benchmark("Read UnsafeArrayData", count * iters) +benchmark.addCase("Int")(readIntArray) +benchmark.addCase("Double")(readDoubleArray) +benchmark.run +/* +Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.10.4 +Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz + +Read UnsafeArrayData:Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative + +Int279 / 294600.4 1.7 1.0X +Double 296 / 303567.0 1.8 0.9X +*/ + } + + def writeUnsafeArray(iters: Int): Unit = { +val count = 1024 * 1024 * 16 + +val intUnsafeRow = new UnsafeRow(1) +val intUnsafeArrayWriter = new UnsafeArrayWriter --- End diff -- have you seen my comment here? https://github.com/apache/spark/pull/13680/files#r69392823 testing the array writer is so low level and peoples are more interested in writing the whole array. If you take a look at what `encoder.toRow` does, it generates a
[GitHub] spark issue #13494: [SPARK-15752] [SQL] Optimize metadata only query that ha...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13494 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13494: [SPARK-15752] [SQL] Optimize metadata only query that ha...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13494 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61891/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13494: [SPARK-15752] [SQL] Optimize metadata only query that ha...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13494 **[Test build #61891 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61891/consoleFull)** for PR 13494 at commit [`9546b40`](https://github.com/apache/spark/commit/9546b40840e69166c563c491ef6720ccb2f1b2eb). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69850344 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/UnsafeArrayDataBenchmark.scala --- @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import scala.util.Random + +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder +import org.apache.spark.sql.catalyst.expressions.{UnsafeArrayData, UnsafeRow} +import org.apache.spark.sql.catalyst.expressions.codegen.{BufferHolder, UnsafeArrayWriter} +import org.apache.spark.util.Benchmark + +/** + * Benchmark [[UnsafeArrayDataBenchmark]] for UnsafeArrayData + * To run this: + * 1. replace ignore(...) with test(...) + * 2. build/sbt "sql/test-only *benchmark.UnsafeArrayDataBenchmark" + * + * Benchmarks in this file are skipped in normal builds. + */ +class UnsafeArrayDataBenchmark extends BenchmarkBase { + + def calculateHeaderPortionInBytes(count: Int) : Int = { +// Use this assignment for SPARK-15962 +// val size = 4 + 4 * count +val size = UnsafeArrayData.calculateHeaderPortionInBytes(count) +size + } + + def readUnsafeArray(iters: Int): Unit = { +val count = 1024 * 1024 * 16 +val rand = new Random(42) + +var intResult: Int = 0 +val intBuffer = Array.fill[Int](count) { rand.nextInt } +val intEncoder = ExpressionEncoder[Array[Int]].resolveAndBind() +val intInternalRow = intEncoder.toRow(intBuffer) +val intUnsafeArray = intInternalRow.getArray(0) +val readIntArray = { i: Int => + var n = 0 + while (n < iters) { +val len = intUnsafeArray.numElements +var sum = 0.toInt +var i = 0 +while (i < len) { + sum += intUnsafeArray.getInt(i) + i += 1 +} +intResult = sum --- End diff -- did you see a very different performance result without this assignment? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69850312 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/UnsafeArrayDataBenchmark.scala --- @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import scala.util.Random + +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder +import org.apache.spark.sql.catalyst.expressions.{UnsafeArrayData, UnsafeRow} +import org.apache.spark.sql.catalyst.expressions.codegen.{BufferHolder, UnsafeArrayWriter} +import org.apache.spark.util.Benchmark + +/** + * Benchmark [[UnsafeArrayDataBenchmark]] for UnsafeArrayData + * To run this: + * 1. replace ignore(...) with test(...) + * 2. build/sbt "sql/test-only *benchmark.UnsafeArrayDataBenchmark" + * + * Benchmarks in this file are skipped in normal builds. + */ +class UnsafeArrayDataBenchmark extends BenchmarkBase { + + def calculateHeaderPortionInBytes(count: Int) : Int = { +// Use this assignment for SPARK-15962 +// val size = 4 + 4 * count +val size = UnsafeArrayData.calculateHeaderPortionInBytes(count) +size + } + + def readUnsafeArray(iters: Int): Unit = { +val count = 1024 * 1024 * 16 +val rand = new Random(42) + +var intResult: Int = 0 +val intBuffer = Array.fill[Int](count) { rand.nextInt } +val intEncoder = ExpressionEncoder[Array[Int]].resolveAndBind() +val intInternalRow = intEncoder.toRow(intBuffer) +val intUnsafeArray = intInternalRow.getArray(0) +val readIntArray = { i: Int => + var n = 0 + while (n < iters) { +val len = intUnsafeArray.numElements +var sum = 0.toInt +var i = 0 +while (i < len) { + sum += intUnsafeArray.getInt(i) + i += 1 +} +intResult = sum +n += 1 + } +} + +var doubleResult: Double = 0 +val doubleBuffer = Array.fill[Double](count) { rand.nextDouble } +val doubleEncoder = ExpressionEncoder[Array[Double]].resolveAndBind() +val doubleInternalRow = doubleEncoder.toRow(doubleBuffer) +val doubleUnsafeArray = doubleInternalRow.getArray(0) +val readDoubleArray = { i: Int => + var n = 0 + while (n < iters) { +val len = doubleUnsafeArray.numElements +var sum = 0.toDouble --- End diff -- `var sum = 0L` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69850289 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/UnsafeArrayDataBenchmark.scala --- @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import scala.util.Random + +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder +import org.apache.spark.sql.catalyst.expressions.{UnsafeArrayData, UnsafeRow} +import org.apache.spark.sql.catalyst.expressions.codegen.{BufferHolder, UnsafeArrayWriter} +import org.apache.spark.util.Benchmark + +/** + * Benchmark [[UnsafeArrayDataBenchmark]] for UnsafeArrayData + * To run this: + * 1. replace ignore(...) with test(...) + * 2. build/sbt "sql/test-only *benchmark.UnsafeArrayDataBenchmark" + * + * Benchmarks in this file are skipped in normal builds. + */ +class UnsafeArrayDataBenchmark extends BenchmarkBase { + + def calculateHeaderPortionInBytes(count: Int) : Int = { +// Use this assignment for SPARK-15962 +// val size = 4 + 4 * count +val size = UnsafeArrayData.calculateHeaderPortionInBytes(count) +size + } + + def readUnsafeArray(iters: Int): Unit = { +val count = 1024 * 1024 * 16 +val rand = new Random(42) + +var intResult: Int = 0 +val intBuffer = Array.fill[Int](count) { rand.nextInt } +val intEncoder = ExpressionEncoder[Array[Int]].resolveAndBind() +val intInternalRow = intEncoder.toRow(intBuffer) +val intUnsafeArray = intInternalRow.getArray(0) +val readIntArray = { i: Int => + var n = 0 + while (n < iters) { +val len = intUnsafeArray.numElements +var sum = 0.toInt --- End diff -- unnecessary `toInt` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69850272 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/UnsafeArrayDataBenchmark.scala --- @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import scala.util.Random + +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder +import org.apache.spark.sql.catalyst.expressions.{UnsafeArrayData, UnsafeRow} +import org.apache.spark.sql.catalyst.expressions.codegen.{BufferHolder, UnsafeArrayWriter} +import org.apache.spark.util.Benchmark + +/** + * Benchmark [[UnsafeArrayDataBenchmark]] for UnsafeArrayData + * To run this: + * 1. replace ignore(...) with test(...) + * 2. build/sbt "sql/test-only *benchmark.UnsafeArrayDataBenchmark" + * + * Benchmarks in this file are skipped in normal builds. + */ +class UnsafeArrayDataBenchmark extends BenchmarkBase { + + def calculateHeaderPortionInBytes(count: Int) : Int = { +// Use this assignment for SPARK-15962 +// val size = 4 + 4 * count +val size = UnsafeArrayData.calculateHeaderPortionInBytes(count) +size + } + + def readUnsafeArray(iters: Int): Unit = { +val count = 1024 * 1024 * 16 +val rand = new Random(42) + +var intResult: Int = 0 +val intBuffer = Array.fill[Int](count) { rand.nextInt } +val intEncoder = ExpressionEncoder[Array[Int]].resolveAndBind() +val intInternalRow = intEncoder.toRow(intBuffer) +val intUnsafeArray = intInternalRow.getArray(0) --- End diff -- combine these 2 lines --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14075: [SPARK-16401] [SQL] Data Source API: Enable Extending Re...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14075 @rxin This is a regression. I did try it in Spark 1.6. It works well. I think we need to fix it in Spark 2.0 Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69850225 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/UnsafeArrayDataBenchmark.scala --- @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import scala.util.Random + +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder +import org.apache.spark.sql.catalyst.expressions.{UnsafeArrayData, UnsafeRow} +import org.apache.spark.sql.catalyst.expressions.codegen.{BufferHolder, UnsafeArrayWriter} +import org.apache.spark.util.Benchmark + +/** + * Benchmark [[UnsafeArrayDataBenchmark]] for UnsafeArrayData + * To run this: + * 1. replace ignore(...) with test(...) + * 2. build/sbt "sql/test-only *benchmark.UnsafeArrayDataBenchmark" + * + * Benchmarks in this file are skipped in normal builds. + */ +class UnsafeArrayDataBenchmark extends BenchmarkBase { + + def calculateHeaderPortionInBytes(count: Int) : Int = { +// Use this assignment for SPARK-15962 +// val size = 4 + 4 * count +val size = UnsafeArrayData.calculateHeaderPortionInBytes(count) +size --- End diff -- we can just return `UnsafeArrayData.calculateHeaderPortionInBytes(count)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69850211 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/UnsafeArrayDataBenchmark.scala --- @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import scala.util.Random + +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder +import org.apache.spark.sql.catalyst.expressions.{UnsafeArrayData, UnsafeRow} +import org.apache.spark.sql.catalyst.expressions.codegen.{BufferHolder, UnsafeArrayWriter} +import org.apache.spark.util.Benchmark + +/** + * Benchmark [[UnsafeArrayDataBenchmark]] for UnsafeArrayData + * To run this: + * 1. replace ignore(...) with test(...) + * 2. build/sbt "sql/test-only *benchmark.UnsafeArrayDataBenchmark" + * + * Benchmarks in this file are skipped in normal builds. + */ +class UnsafeArrayDataBenchmark extends BenchmarkBase { + + def calculateHeaderPortionInBytes(count: Int) : Int = { +// Use this assignment for SPARK-15962 +// val size = 4 + 4 * count --- End diff -- remove this comment? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14030: [SPARK-16350][SQL] Fix support for incremental planning ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14030 **[Test build #61893 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61893/consoleFull)** for PR 14030 at commit [`85e2352`](https://github.com/apache/spark/commit/85e2352c3010ed311603acd5007c6b3e05c25056). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69850185 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/UnsafeArraySuite.scala --- @@ -18,27 +18,126 @@ package org.apache.spark.sql.catalyst.util import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder import org.apache.spark.sql.catalyst.expressions.UnsafeArrayData +import org.apache.spark.unsafe.Platform class UnsafeArraySuite extends SparkFunSuite { - test("from primitive int array") { -val array = Array(1, 10, 100) -val unsafe = UnsafeArrayData.fromPrimitiveArray(array) -assert(unsafe.numElements == 3) -assert(unsafe.getSizeInBytes == 4 + 4 * 3 + 4 * 3) -assert(unsafe.getInt(0) == 1) -assert(unsafe.getInt(1) == 10) -assert(unsafe.getInt(2) == 100) + val booleanArray = Array(false, true) + val shortArray = Array(1.toShort, 10.toShort, 100.toShort) + val intArray = Array(1, 10, 100) + val longArray = Array(1.toLong, 10.toLong, 100.toLong) + val floatArray = Array(1.1.toFloat, 2.2.toFloat, 3.3.toFloat) + val doubleArray = Array(1.1, 2.2, 3.3) + + val intMultiDimArray = Array(Array(1, 10), Array(2, 20, 200), Array(3, 30, 300, 3000)) + val doubleMultiDimArray = Array( +Array(1.1, 11.1), Array(2.2, 22.2, 222.2), Array(3.3, 33.3, 333.3, .3)) + + test("read array") { +val unsafeBoolean = ExpressionEncoder[Array[Boolean]].resolveAndBind(). + toRow(booleanArray).getArray(0) +assert(unsafeBoolean.isInstanceOf[UnsafeArrayData]) +assert(unsafeBoolean.numElements == booleanArray.length) +booleanArray.zipWithIndex.map { case (e, i) => + assert(unsafeBoolean.getBoolean(i) == e) +} + +val unsafeShort = ExpressionEncoder[Array[Short]].resolveAndBind(). + toRow(shortArray).getArray(0) +assert(unsafeShort.isInstanceOf[UnsafeArrayData]) +assert(unsafeShort.numElements == shortArray.length) +shortArray.zipWithIndex.map { case (e, i) => + assert(unsafeShort.getShort(i) == e) +} + +val unsafeInt = ExpressionEncoder[Array[Int]].resolveAndBind(). + toRow(intArray).getArray(0) +assert(unsafeInt.isInstanceOf[UnsafeArrayData]) +assert(unsafeInt.numElements == intArray.length) +intArray.zipWithIndex.map { case (e, i) => + assert(unsafeInt.getInt(i) == e) +} + +val unsafeLong = ExpressionEncoder[Array[Long]].resolveAndBind(). + toRow(longArray).getArray(0) +assert(unsafeLong.isInstanceOf[UnsafeArrayData]) +assert(unsafeLong.numElements == longArray.length) +longArray.zipWithIndex.map { case (e, i) => + assert(unsafeLong.getLong(i) == e) +} + +val unsafeFloat = ExpressionEncoder[Array[Float]].resolveAndBind(). + toRow(floatArray).getArray(0) +assert(unsafeFloat.isInstanceOf[UnsafeArrayData]) +assert(unsafeFloat.numElements == floatArray.length) +floatArray.zipWithIndex.map { case (e, i) => + assert(unsafeFloat.getFloat(i) == e) +} + +val unsafeDouble = ExpressionEncoder[Array[Double]].resolveAndBind(). + toRow(doubleArray).getArray(0) +assert(unsafeDouble.isInstanceOf[UnsafeArrayData]) +assert(unsafeDouble.numElements == doubleArray.length) +doubleArray.zipWithIndex.map { case (e, i) => + assert(unsafeDouble.getDouble(i) == e) +} + +val unsafeMultiDimInt = ExpressionEncoder[Array[Array[Int]]].resolveAndBind(). + toRow(intMultiDimArray).getArray(0) +assert(unsafeMultiDimInt.isInstanceOf[UnsafeArrayData]) +assert(unsafeMultiDimInt.numElements == intMultiDimArray.length) +intMultiDimArray.zipWithIndex.map { case (a, j) => + val u = unsafeMultiDimInt.getArray(j) + assert(u.isInstanceOf[UnsafeArrayData]) + assert(u.numElements == a.length) + a.zipWithIndex.map { case (e, i) => +assert(u.getInt(i) == e) + } +} + +val unsafeMultiDimDouble = ExpressionEncoder[Array[Array[Double]]].resolveAndBind(). + toRow(doubleMultiDimArray).getArray(0) +assert(unsafeDouble.isInstanceOf[UnsafeArrayData]) +assert(unsafeMultiDimDouble.numElements == doubleMultiDimArray.length) +doubleMultiDimArray.zipWithIndex.map { case (a, j) => + val u = unsafeMultiDimDouble.getArray(j) + assert(u.isInstanceOf[UnsafeArrayData]) + assert(u.numElements == a.length) + a.zipWithIndex.map { case (e, i) => +assert(u.getDouble(i) == e) + } +} + } + + test("from primitive array") { +val unsafeInt =
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69850087 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/UnsafeArraySuite.scala --- @@ -18,27 +18,126 @@ package org.apache.spark.sql.catalyst.util import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder import org.apache.spark.sql.catalyst.expressions.UnsafeArrayData +import org.apache.spark.unsafe.Platform class UnsafeArraySuite extends SparkFunSuite { - test("from primitive int array") { -val array = Array(1, 10, 100) -val unsafe = UnsafeArrayData.fromPrimitiveArray(array) -assert(unsafe.numElements == 3) -assert(unsafe.getSizeInBytes == 4 + 4 * 3 + 4 * 3) -assert(unsafe.getInt(0) == 1) -assert(unsafe.getInt(1) == 10) -assert(unsafe.getInt(2) == 100) + val booleanArray = Array(false, true) + val shortArray = Array(1.toShort, 10.toShort, 100.toShort) + val intArray = Array(1, 10, 100) + val longArray = Array(1.toLong, 10.toLong, 100.toLong) + val floatArray = Array(1.1.toFloat, 2.2.toFloat, 3.3.toFloat) + val doubleArray = Array(1.1, 2.2, 3.3) --- End diff -- let's also test string array --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69850001 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java --- @@ -33,91 +38,144 @@ // The offset of the global buffer where we start to write this array. private int startingOffset; - public void initialize(BufferHolder holder, int numElements, int fixedElementSize) { -// We need 4 bytes to store numElements and 4 bytes each element to store offset. -final int fixedSize = 4 + 4 * numElements; + // The number of elements in this array + private int numElements; + + private int headerInBytes; + + private void assertIndexIsValid(int index) { +assert index >= 0 : "index (" + index + ") should >= 0"; +assert index < numElements : "index (" + index + ") should < " + numElements; + } + + public void initialize(BufferHolder holder, int numElements, int elementSize) { +// We need 4 bytes to store numElements in header +this.numElements = numElements; +this.headerInBytes = calculateHeaderPortionInBytes(numElements); this.holder = holder; this.startingOffset = holder.cursor; -holder.grow(fixedSize); -Platform.putInt(holder.buffer, holder.cursor, numElements); -holder.cursor += fixedSize; +// Grows the global buffer ahead for header and fixed size data. +holder.grow(headerInBytes + elementSize * numElements); + +// Write numElements and clear out null bits to header +Platform.putInt(holder.buffer, startingOffset, numElements); +for (int i = 4; i < headerInBytes; i += 8) { + Platform.putLong(holder.buffer, startingOffset + i, 0L); +} +holder.cursor += (headerInBytes + elementSize * numElements); + } + + private long getElementOffset(int ordinal, int elementSize) { +return startingOffset + headerInBytes + ordinal * elementSize; + } + + public void setOffset(int ordinal, int currentCursor) { --- End diff -- do we always pass in `holder.cursor` as `currentCursor`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69849961 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala --- @@ -189,28 +189,29 @@ object GenerateUnsafeProjection extends CodeGenerator[Seq[Expression], UnsafePro val jt = ctx.javaType(et) -val fixedElementSize = et match { +val elementOrOffsetSize = et match { case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS => 8 case _ if ctx.isPrimitiveType(jt) => et.defaultSize - case _ => 0 + case _ => 8 // we need 8 bytes to store offset and length for variable-length types } +val tmpCursor = ctx.freshName("tmpCursor") --- End diff -- it's never used --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69849874 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala --- @@ -189,28 +189,29 @@ object GenerateUnsafeProjection extends CodeGenerator[Seq[Expression], UnsafePro val jt = ctx.javaType(et) -val fixedElementSize = et match { +val elementOrOffsetSize = et match { case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS => 8 case _ if ctx.isPrimitiveType(jt) => et.defaultSize - case _ => 0 + case _ => 8 // we need 8 bytes to store offset and length for variable-length types --- End diff -- It should be 4 now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13494: [SPARK-15752] [SQL] Optimize metadata only query that ha...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13494 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13494: [SPARK-15752] [SQL] Optimize metadata only query that ha...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13494 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61890/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13494: [SPARK-15752] [SQL] Optimize metadata only query that ha...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13494 **[Test build #61890 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61890/consoleFull)** for PR 13494 at commit [`88fd3bf`](https://github.com/apache/spark/commit/88fd3bfb13852e1739cef0146209437b417445f3). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14082: [SPARK-16381][SQL][SparkR] Update SQL examples and progr...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14082 @shivaram @mengxr It would be nice if any of you can help review this one, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14077: [SPARK-16402] [SQL] JDBC Source: Implement save API of D...
Github user JustinPihony commented on the issue: https://github.com/apache/spark/pull/14077 @gatorsmile If `copy` is a bug, then that is fine with me (I just commented my findings on this and will be curious to hear back). That said, it would make my implementation simpler. I'd be fine with simplifying it down to a basic save, however I am still not OK with the circular reference. It adds confusion to debugging and future refactorings. And to fix that, you end up back at my PR. So, ultimately this still seems like a duplicate to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69849567 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java --- @@ -19,9 +19,14 @@ import org.apache.spark.sql.types.Decimal; import org.apache.spark.unsafe.Platform; +import org.apache.spark.unsafe.bitset.BitSetMethods; import org.apache.spark.unsafe.types.CalendarInterval; import org.apache.spark.unsafe.types.UTF8String; +import java.util.Arrays; --- End diff -- it's still here... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69849512 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -341,63 +328,115 @@ public UnsafeArrayData copy() { return arrayCopy; } - public static UnsafeArrayData fromPrimitiveArray(int[] arr) { -if (arr.length > (Integer.MAX_VALUE - 4) / 8) { - throw new UnsupportedOperationException("Cannot convert this array to unsafe format as " + -"it's too big."); -} + @Override + public boolean[] toBooleanArray() { +int size = numElements(); +boolean[] values = new boolean[size]; +Platform.copyMemory( + baseObject, baseOffset + headerInBytes, values, Platform.BOOLEAN_ARRAY_OFFSET, size); +return values; + } -final int offsetRegionSize = 4 * arr.length; -final int valueRegionSize = 4 * arr.length; -final int totalSize = 4 + offsetRegionSize + valueRegionSize; -final byte[] data = new byte[totalSize]; + @Override + public byte[] toByteArray() { +int size = numElements(); +byte[] values = new byte[size]; +Platform.copyMemory( + baseObject, baseOffset + headerInBytes, values, Platform.BYTE_ARRAY_OFFSET, size); +return values; + } -Platform.putInt(data, Platform.BYTE_ARRAY_OFFSET, arr.length); + @Override + public short[] toShortArray() { +int size = numElements(); +short[] values = new short[size]; +Platform.copyMemory( + baseObject, baseOffset + headerInBytes, values, Platform.SHORT_ARRAY_OFFSET, size * 2); +return values; + } -int offsetPosition = Platform.BYTE_ARRAY_OFFSET + 4; -int valueOffset = 4 + offsetRegionSize; -for (int i = 0; i < arr.length; i++) { - Platform.putInt(data, offsetPosition, valueOffset); - offsetPosition += 4; - valueOffset += 4; -} + @Override + public int[] toIntArray() { +int size = numElements(); +int[] values = new int[size]; +Platform.copyMemory( + baseObject, baseOffset + headerInBytes, values, Platform.INT_ARRAY_OFFSET, size * 4); +return values; + } -Platform.copyMemory(arr, Platform.INT_ARRAY_OFFSET, data, - Platform.BYTE_ARRAY_OFFSET + 4 + offsetRegionSize, valueRegionSize); + @Override + public long[] toLongArray() { +int size = numElements(); +long[] values = new long[size]; +Platform.copyMemory( + baseObject, baseOffset + headerInBytes, values, Platform.LONG_ARRAY_OFFSET, size * 8); +return values; + } -UnsafeArrayData result = new UnsafeArrayData(); -result.pointTo(data, Platform.BYTE_ARRAY_OFFSET, totalSize); -return result; + @Override + public float[] toFloatArray() { +int size = numElements(); +float[] values = new float[size]; +Platform.copyMemory( + baseObject, baseOffset + headerInBytes, values, Platform.FLOAT_ARRAY_OFFSET, size * 4); +return values; } - public static UnsafeArrayData fromPrimitiveArray(double[] arr) { -if (arr.length > (Integer.MAX_VALUE - 4) / 12) { + @Override + public double[] toDoubleArray() { +int size = numElements(); +double[] values = new double[size]; +Platform.copyMemory( + baseObject, baseOffset + headerInBytes, values, Platform.DOUBLE_ARRAY_OFFSET, size * 8); +return values; + } + + private static UnsafeArrayData fromPrimitiveArray( + Object arr, int offset, int length, int elementSize) { +final long headerInBytes = calculateHeaderPortionInBytes(length); +final long valueRegionInBytes = (long)elementSize * (long)length; +final long totalSizeInLongs = (headerInBytes + valueRegionInBytes + 7) / 8; --- End diff -- we should declare `totalSizeInLongs` as int --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69849441 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -341,63 +328,115 @@ public UnsafeArrayData copy() { return arrayCopy; } - public static UnsafeArrayData fromPrimitiveArray(int[] arr) { -if (arr.length > (Integer.MAX_VALUE - 4) / 8) { - throw new UnsupportedOperationException("Cannot convert this array to unsafe format as " + -"it's too big."); -} + @Override + public boolean[] toBooleanArray() { +int size = numElements(); +boolean[] values = new boolean[size]; +Platform.copyMemory( + baseObject, baseOffset + headerInBytes, values, Platform.BOOLEAN_ARRAY_OFFSET, size); +return values; + } -final int offsetRegionSize = 4 * arr.length; -final int valueRegionSize = 4 * arr.length; -final int totalSize = 4 + offsetRegionSize + valueRegionSize; -final byte[] data = new byte[totalSize]; + @Override + public byte[] toByteArray() { +int size = numElements(); +byte[] values = new byte[size]; +Platform.copyMemory( + baseObject, baseOffset + headerInBytes, values, Platform.BYTE_ARRAY_OFFSET, size); +return values; + } -Platform.putInt(data, Platform.BYTE_ARRAY_OFFSET, arr.length); + @Override + public short[] toShortArray() { +int size = numElements(); +short[] values = new short[size]; +Platform.copyMemory( + baseObject, baseOffset + headerInBytes, values, Platform.SHORT_ARRAY_OFFSET, size * 2); +return values; + } -int offsetPosition = Platform.BYTE_ARRAY_OFFSET + 4; -int valueOffset = 4 + offsetRegionSize; -for (int i = 0; i < arr.length; i++) { - Platform.putInt(data, offsetPosition, valueOffset); - offsetPosition += 4; - valueOffset += 4; -} + @Override + public int[] toIntArray() { +int size = numElements(); +int[] values = new int[size]; +Platform.copyMemory( + baseObject, baseOffset + headerInBytes, values, Platform.INT_ARRAY_OFFSET, size * 4); +return values; + } -Platform.copyMemory(arr, Platform.INT_ARRAY_OFFSET, data, - Platform.BYTE_ARRAY_OFFSET + 4 + offsetRegionSize, valueRegionSize); + @Override + public long[] toLongArray() { +int size = numElements(); +long[] values = new long[size]; +Platform.copyMemory( + baseObject, baseOffset + headerInBytes, values, Platform.LONG_ARRAY_OFFSET, size * 8); +return values; + } -UnsafeArrayData result = new UnsafeArrayData(); -result.pointTo(data, Platform.BYTE_ARRAY_OFFSET, totalSize); -return result; + @Override + public float[] toFloatArray() { +int size = numElements(); +float[] values = new float[size]; +Platform.copyMemory( + baseObject, baseOffset + headerInBytes, values, Platform.FLOAT_ARRAY_OFFSET, size * 4); +return values; } - public static UnsafeArrayData fromPrimitiveArray(double[] arr) { -if (arr.length > (Integer.MAX_VALUE - 4) / 12) { + @Override + public double[] toDoubleArray() { +int size = numElements(); +double[] values = new double[size]; +Platform.copyMemory( + baseObject, baseOffset + headerInBytes, values, Platform.DOUBLE_ARRAY_OFFSET, size * 8); +return values; + } + + private static UnsafeArrayData fromPrimitiveArray( + Object arr, int offset, int length, int elementSize) { +final long headerInBytes = calculateHeaderPortionInBytes(length); +final long valueRegionInBytes = (long)elementSize * (long)length; +final long totalSizeInLongs = (headerInBytes + valueRegionInBytes + 7) / 8; +if (totalSizeInLongs * 8> Integer.MAX_VALUE) { --- End diff -- nit: a space after `8` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14083: [SPARK-16406][SQL] Improve performance of Logical...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/14083#discussion_r69849376 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -165,111 +169,99 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { def resolveQuoted( name: String, resolver: Resolver): Option[NamedExpression] = { -resolve(UnresolvedAttribute.parseAttributeName(name), output, resolver) + outputAttributeResolver.resolve(UnresolvedAttribute.parseAttributeName(name), resolver) } /** - * Resolve the given `name` string against the given attribute, returning either 0 or 1 match. - * - * This assumes `name` has multiple parts, where the 1st part is a qualifier - * (i.e. table name, alias, or subquery alias). - * See the comment above `candidates` variable in resolve() for semantics the returned data. + * Refreshes (or invalidates) any metadata/data cached in the plan recursively. */ - private def resolveAsTableColumn( - nameParts: Seq[String], - resolver: Resolver, - attribute: Attribute): Option[(Attribute, List[String])] = { -assert(nameParts.length > 1) -if (attribute.qualifier.exists(resolver(_, nameParts.head))) { - // At least one qualifier matches. See if remaining parts match. - val remainingParts = nameParts.tail - resolveAsColumn(remainingParts, resolver, attribute) -} else { - None -} + def refresh(): Unit = children.foreach(_.refresh()) +} + +/** + * Helper class for (LogicalPlan) attribute resolution. This class indexes attributes by their + * case-in-sensitive name, and checks potential candidates using the given Resolver. Both qualified + * and direct resolution are supported. + */ +private[catalyst] class AttributeResolver(attributes: Seq[Attribute]) extends Logging { + private def unique[T](m: Map[T, Seq[Attribute]]): Map[T, Seq[Attribute]] = { +m.mapValues(_.distinct).map(identity) } - /** - * Resolve the given `name` string against the given attribute, returning either 0 or 1 match. - * - * Different from resolveAsTableColumn, this assumes `name` does NOT start with a qualifier. - * See the comment above `candidates` variable in resolve() for semantics the returned data. - */ - private def resolveAsColumn( - nameParts: Seq[String], - resolver: Resolver, - attribute: Attribute): Option[(Attribute, List[String])] = { -if (!attribute.isGenerated && resolver(attribute.name, nameParts.head)) { - Option((attribute.withName(nameParts.head), nameParts.tail.toList)) -} else { - None + /** Map to use for direct case insensitive attribute lookups. */ + private val direct: Map[String, Seq[Attribute]] = { --- End diff -- You have a point: it is the secondary code path, so it is less likely to be used. I'll take a look at it on my next pass. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14075: [SPARK-16401] [SQL] Data Source API: Enable Extending Re...
Github user JustinPihony commented on the issue: https://github.com/apache/spark/pull/14075 @rxin It does look like this might have been a regression introduced via [the initial creation of `DataSource`](https://github.com/apache/spark/blob/1e28840594b9d972c96d3922ca0bf0f76e313e82/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala) from [`ResolvedDataSource`](https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ResolvedDataSource.scala). Although, @marmbrus could speak to it better as he was the one who wrote that code. Maybe there was a reason? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69849225 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -237,62 +229,57 @@ public Decimal getDecimal(int ordinal, int precision, int scale) { @Override public UTF8String getUTF8String(int ordinal) { -assertIndexIsValid(ordinal); -final int offset = getElementOffset(ordinal); -if (offset < 0) return null; -final int size = getElementSize(offset, ordinal); +if (isNullAt(ordinal)) return null; +final int offset = getInt(ordinal); +final int size = getSize(ordinal); return UTF8String.fromAddress(baseObject, baseOffset + offset, size); } @Override public byte[] getBinary(int ordinal) { -assertIndexIsValid(ordinal); -final int offset = getElementOffset(ordinal); -if (offset < 0) return null; -final int size = getElementSize(offset, ordinal); +if (isNullAt(ordinal)) return null; +final int offset = getInt(ordinal); +final int size = getSize(ordinal); final byte[] bytes = new byte[size]; Platform.copyMemory(baseObject, baseOffset + offset, bytes, Platform.BYTE_ARRAY_OFFSET, size); return bytes; } @Override public CalendarInterval getInterval(int ordinal) { -assertIndexIsValid(ordinal); -final int offset = getElementOffset(ordinal); -if (offset < 0) return null; +if (isNullAt(ordinal)) return null; +final long offsetAndSize = getLong(ordinal); --- End diff -- it's unused --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14083: [SPARK-16406][SQL] Improve performance of Logical...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14083#discussion_r69849157 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -165,111 +169,99 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { def resolveQuoted( name: String, resolver: Resolver): Option[NamedExpression] = { -resolve(UnresolvedAttribute.parseAttributeName(name), output, resolver) + outputAttributeResolver.resolve(UnresolvedAttribute.parseAttributeName(name), resolver) } /** - * Resolve the given `name` string against the given attribute, returning either 0 or 1 match. - * - * This assumes `name` has multiple parts, where the 1st part is a qualifier - * (i.e. table name, alias, or subquery alias). - * See the comment above `candidates` variable in resolve() for semantics the returned data. + * Refreshes (or invalidates) any metadata/data cached in the plan recursively. */ - private def resolveAsTableColumn( - nameParts: Seq[String], - resolver: Resolver, - attribute: Attribute): Option[(Attribute, List[String])] = { -assert(nameParts.length > 1) -if (attribute.qualifier.exists(resolver(_, nameParts.head))) { - // At least one qualifier matches. See if remaining parts match. - val remainingParts = nameParts.tail - resolveAsColumn(remainingParts, resolver, attribute) -} else { - None -} + def refresh(): Unit = children.foreach(_.refresh()) +} + +/** + * Helper class for (LogicalPlan) attribute resolution. This class indexes attributes by their + * case-in-sensitive name, and checks potential candidates using the given Resolver. Both qualified + * and direct resolution are supported. + */ +private[catalyst] class AttributeResolver(attributes: Seq[Attribute]) extends Logging { + private def unique[T](m: Map[T, Seq[Attribute]]): Map[T, Seq[Attribute]] = { +m.mapValues(_.distinct).map(identity) } - /** - * Resolve the given `name` string against the given attribute, returning either 0 or 1 match. - * - * Different from resolveAsTableColumn, this assumes `name` does NOT start with a qualifier. - * See the comment above `candidates` variable in resolve() for semantics the returned data. - */ - private def resolveAsColumn( - nameParts: Seq[String], - resolver: Resolver, - attribute: Attribute): Option[(Attribute, List[String])] = { -if (!attribute.isGenerated && resolver(attribute.name, nameParts.head)) { - Option((attribute.withName(nameParts.head), nameParts.tail.toList)) -} else { - None + /** Map to use for direct case insensitive attribute lookups. */ + private val direct: Map[String, Seq[Attribute]] = { --- End diff -- nit: do we need to use lazy val? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14079: [SPARK-8425][CORE] New Blacklist Mechanism
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/14079#discussion_r69849058 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -236,29 +246,41 @@ private[spark] class TaskSchedulerImpl( * given TaskSetManager have completed, so state associated with the TaskSetManager should be * cleaned up. */ - def taskSetFinished(manager: TaskSetManager): Unit = synchronized { + def taskSetFinished(manager: TaskSetManager, success: Boolean): Unit = synchronized { taskSetsByStageIdAndAttempt.get(manager.taskSet.stageId).foreach { taskSetsForStage => taskSetsForStage -= manager.taskSet.stageAttemptId if (taskSetsForStage.isEmpty) { taskSetsByStageIdAndAttempt -= manager.taskSet.stageId } } manager.parent.removeSchedulable(manager) -logInfo("Removed TaskSet %s, whose tasks have all completed, from pool %s" - .format(manager.taskSet.id, manager.parent.name)) +if (success) { + blacklistTracker.taskSetSucceeded(manager.taskSet.stageId, this) + logInfo(s"Removed TaskSet ${manager.taskSet.id}, whose tasks have all completed, from pool" + +s" ${manager.parent.name}") +} else { + blacklistTracker.taskSetFailed(manager.taskSet.stageId) + logInfo(s"Removed TaskSet ${manager.taskSet.id}, since it failed, from pool" + +s" ${manager.parent.name}") --- End diff -- Changing the log msg is unrelated to blacklisting, but this msg had always annoyed / confused me earlier, so I thought it was worth updating since i needed `success` anyway. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14053: [SPARK-16374] [SQL] Remove Alias from MetastoreRe...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14053 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14053: [SPARK-16374] [SQL] Remove Alias from MetastoreRelation ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14053 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14004: [SPARK-16285][SQL] Implement sentences SQL functions
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14004 LGTM except some style comment, thanks for working on it! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14004: [SPARK-16285][SQL] Implement sentences SQL functi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14004#discussion_r69848762 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala --- @@ -347,4 +347,24 @@ class StringFunctionsSuite extends QueryTest with SharedSQLContext { df2.filter("b>0").selectExpr("format_number(a, b)"), Row("5.") :: Row("4.000") :: Row("4.000") :: Row("4.000") :: Row("3.00") :: Nil) } + + test("string sentences function") { +val df = Seq(("Hi there! The price was $1,234.56 But, not now.", "en", "US")) + .toDF("str", "language", "country") + +checkAnswer( + df.selectExpr("sentences(str, language, country)"), + Row(Seq(Seq("Hi", "there"), Seq("The", "price", "was"), Seq("But", "not", "now" + +// Type coercion +checkAnswer( + df.selectExpr("sentences(null)", "sentences(10)", "sentences(3.14)"), + Row(null, Seq(Seq("10")), Seq(Seq("3.14" + +// Argument number exception +val m = intercept[AnalysisException] { + df.selectExpr("sentences()") +}.getMessage +assert(m.contains("Invalid number of arguments")) --- End diff -- btw what's the full error message here? I thought it would be `function not found` or something... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14004: [SPARK-16285][SQL] Implement sentences SQL functi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14004#discussion_r69848674 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala --- @@ -725,4 +725,41 @@ class StringExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkEvaluation(FindInSet(Literal("abf"), Literal("abc,b,ab,c,def")), 0) checkEvaluation(FindInSet(Literal("ab,"), Literal("abc,b,ab,c,def")), 0) } + + test("Sentences") { +val nullString = Literal.create(null, StringType) +checkEvaluation(Sentences(nullString, nullString, nullString), null, EmptyRow) +checkEvaluation(Sentences(nullString, nullString), null, EmptyRow) +checkEvaluation(Sentences(nullString), null, EmptyRow) +checkEvaluation(Sentences(Literal.create(null, NullType)), null, EmptyRow) +checkEvaluation(Sentences("", nullString, nullString), Seq.empty, EmptyRow) +checkEvaluation(Sentences("", nullString), Seq.empty, EmptyRow) +checkEvaluation(Sentences(""), Seq.empty, EmptyRow) + +val correct_answer = Seq( + Seq("Hi", "there"), + Seq("The", "price", "was"), + Seq("But", "not", "now")) + +// Hive compatible test-cases. +checkEvaluation( + Sentences("Hi there! The price was $1,234.56 But, not now."), + correct_answer, + EmptyRow) --- End diff -- `EmptyRow` is the default value, we don't need to pass it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13517: [SPARK-14839][SQL] Support for other types for `t...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13517 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types for `tablePro...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/13517 NP :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14004: [SPARK-16285][SQL] Implement sentences SQL functi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14004#discussion_r69848618 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -198,6 +203,66 @@ case class StringSplit(str: Expression, pattern: Expression) override def prettyName: String = "split" } +/** + * Splits a string into arrays of sentences, where each sentence is an array of words. + * The 'lang' and 'country' arguments are optional, and if omitted, the default locale is used. + */ +@ExpressionDescription( + usage = "_FUNC_(str, lang, country) - Splits str into an array of array of words.", + extended = "> SELECT _FUNC_('Hi there! Good morning.');\n [['Hi','there'], ['Good','morning']]") +case class Sentences( +str: Expression, +language: Expression = Literal(""), +country: Expression = Literal("")) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + def this(str: Expression) = this(str, Literal(""), Literal("")) + def this(str: Expression, language: Expression) = this(str, language, Literal("")) + + override def nullable: Boolean = true + override def dataType: DataType = +ArrayType(ArrayType(StringType, containsNull = false), containsNull = false) + override def inputTypes: Seq[AbstractDataType] = Seq(StringType, StringType, StringType) + override def children: Seq[Expression] = str :: language :: country :: Nil + + override def eval(input: InternalRow): Any = { +val string = str.eval(input) +if (string == null) { + null +} else { + val locale = try { +new Locale(language.eval(input).asInstanceOf[UTF8String].toString, + country.eval(input).asInstanceOf[UTF8String].toString) + } catch { +case _: NullPointerException | _: ClassCastException => Locale.getDefault --- End diff -- I'd like to use `if` to check the null explicitly. And `ClassCastException` will never happen as we have type checking. BTW what will be the result of invalid language and country? e.g. `new Locale("abc", "xyz")` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types for `tablePro...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13517 Oh... sorry... and thanks.. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types for `tablePro...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/13517 @HyukjinKwon still on holiday... LGTM - merging to master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14083: [SPARK-16406][SQL] Improve performance of LogicalPlan.re...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14083 yea. :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14083: [SPARK-16406][SQL] Improve performance of LogicalPlan.re...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/14083 @viirya you mean I forgot to add `time(sql(query))`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14077: [SPARK-16402] [SQL] JDBC Source: Implement save API of D...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14077 @JustinPihony Thanks for your review! Let me try to answer your concerns. - The `copy` function location is actually a bug. See another PR: https://github.com/apache/spark/pull/14075. - The trait `CreatableRelationProvider` was introduced for the `save` API. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14036#discussion_r69848182 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala --- @@ -277,14 +268,52 @@ case class Divide(left: Expression, right: Expression) if (${eval1.isNull}) { ${ev.isNull} = true; } else { -${ev.value} = $divide; +${ev.value} = $division; } }""") } } } @ExpressionDescription( + usage = "a _FUNC_ b - Divides a by b.", --- End diff -- we should mention this is a fraction division, i.e. the parameter must be fraction type and the result is also fraction. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14083: [SPARK-16406][SQL] Improve performance of LogicalPlan.re...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14083 The codes in the description seems incomplete? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types for `tablePro...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13517 (@hvanhovell I just addressed your comments!) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14036#discussion_r69848124 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala --- @@ -249,11 +244,7 @@ case class Divide(left: Expression, right: Expression) s"${eval2.value} == 0" } val javaType = ctx.javaType(dataType) -val divide = if (dataType.isInstanceOf[DecimalType]) { --- End diff -- Does it already cover both fraction and integral division? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13890: [SPARK-16189][SQL] Add ExistingRDD logical plan f...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13890#discussion_r69848022 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -74,13 +74,71 @@ object RDDConversions { } } +private[sql] object ExistingRDD { + + def apply[T: Encoder](rdd: RDD[T])(session: SparkSession): LogicalPlan = { +val exisitingRdd = ExistingRDD(CatalystSerde.generateObjAttr[T], rdd)(session) +CatalystSerde.serialize[T](exisitingRdd) + } +} + /** Logical plan node for scanning data from an RDD. */ +private[sql] case class ExistingRDD[T]( +outputObjAttr: Attribute, +rdd: RDD[T])(session: SparkSession) + extends LeafNode with ObjectProducer with MultiInstanceRelation { + + override protected final def otherCopyArgs: Seq[AnyRef] = session :: Nil + + override def newInstance(): ExistingRDD.this.type = +ExistingRDD(outputObjAttr.newInstance(), rdd)(session).asInstanceOf[this.type] + + override def sameResult(plan: LogicalPlan): Boolean = { +plan.canonicalized match { + case ExistingRDD(_, otherRDD) => rdd.id == otherRDD.id + case _ => false +} + } + + override protected def stringArgs: Iterator[Any] = Iterator(output) + + @transient override lazy val statistics: Statistics = Statistics( +// TODO: Instead of returning a default value here, find a way to return a meaningful size +// estimate for RDDs. See PR 1238 for more discussions. +sizeInBytes = BigInt(session.sessionState.conf.defaultSizeInBytes) + ) +} + +/** Physical plan node for scanning data from an RDD. */ +private[sql] case class ExistingRDDScanExec[T]( --- End diff -- how about ExternalRDDScan? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13988: [SPARK-16101][SQL] Refactoring CSV data source to be con...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13988 (@rxin gentle ping..) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14026: [SPARK-13569][STREAMING][KAFKA] pattern based topic subs...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14026 **[Test build #61892 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61892/consoleFull)** for PR 14026 at commit [`f287722`](https://github.com/apache/spark/commit/f2877226ebe8c70d44f35a10bab056402bc9ffa9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #8066: [SPARK-9778][SQL] remove unnecessary evaluation fr...
Github user cloud-fan closed the pull request at: https://github.com/apache/spark/pull/8066 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14072: [SPARK-16398][CORE] Make cancelJob and cancelStage APIs ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14072 **[Test build #3167 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3167/consoleFull)** for PR 14072 at commit [`e2d2cb1`](https://github.com/apache/spark/commit/e2d2cb12af4f30c7fc8f2ff822e49644cfba149d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13876: [SPARK-16174][SQL] Improve `OptimizeIn` optimizer to rem...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13876 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13876: [SPARK-16174][SQL] Improve `OptimizeIn` optimizer to rem...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13876 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61888/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13876: [SPARK-16174][SQL] Improve `OptimizeIn` optimizer to rem...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13876 **[Test build #61888 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61888/consoleFull)** for PR 13876 at commit [`ccf972d`](https://github.com/apache/spark/commit/ccf972dc9c258651a5d9977a4e76d87805f4c1c6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13701 I will update this soon.. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of P...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13778#discussion_r69844600 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -374,13 +407,15 @@ object MapObjects { * @param lambdaFunction A function that take the `loopVar` as input, and used as lambda function * to handle collection elements. * @param inputData An expression that when evaluated returns a collection object. + * @param inputDataType The dataType of inputData. --- End diff -- OK. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org