[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17865 Also cc @ueshin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18268: [SPARK-21054] [SQL] Reset Command support reset specific...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18268 **[Test build #77918 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77918/testReport)** for PR 18268 at commit [`8745848`](https://github.com/apache/spark/commit/874584800aabd07dfd16c8c9632bb070cc7a8210). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16422 **[Test build #77919 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77919/testReport)** for PR 16422 at commit [`0d3c7bf`](https://github.com/apache/spark/commit/0d3c7bf094a3e89f38b3abae0cf530e9c634594a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18268: [SPARK-21054] [SQL] Reset Command support reset s...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18268#discussion_r121320104 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/SetCommand.scala --- @@ -149,13 +149,30 @@ object SetCommand { /** * This command is for resetting SQLConf to the default values. Command that runs * {{{ + * reset key; * reset; * }}} */ -case object ResetCommand extends RunnableCommand with Logging { +case class ResetCommand(key: Option[String]) extends RunnableCommand with Logging { + + private val runFunc: (SparkSession => Unit) = key match { + +case None => + val runFunc = (sparkSession: SparkSession) => { +sparkSession.sessionState.conf.clear() + } + runFunc + +// (In Hive, "RESET key" clear a specific property.) +case Some(key) => + val runFunc = (sparkSession: SparkSession) => { +sparkSession.conf.unset(key) + } + runFunc + } override def run(sparkSession: SparkSession): Seq[Row] = { -sparkSession.sessionState.conf.clear() +runFunc(sparkSession) --- End diff -- Nit: Just use a few lines to implement logics here. No need to add the extra function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18268: [SPARK-21054] [SQL] Reset Command support reset s...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18268#discussion_r121319888 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/SetCommand.scala --- @@ -149,13 +149,30 @@ object SetCommand { /** * This command is for resetting SQLConf to the default values. Command that runs * {{{ + * reset key; * reset; * }}} */ -case object ResetCommand extends RunnableCommand with Logging { +case class ResetCommand(key: Option[String]) extends RunnableCommand with Logging { + + private val runFunc: (SparkSession => Unit) = key match { + --- End diff -- Nit: remove this space --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18268: [SPARK-21054] [SQL] Reset Command support reset s...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18268#discussion_r121319856 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -86,7 +86,12 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) { */ override def visitResetConfiguration( ctx: ResetConfigurationContext): LogicalPlan = withOrigin(ctx) { -ResetCommand +val raw = remainder(ctx.RESET.getSymbol) +if (raw.nonEmpty) { + ResetCommand(Some(raw.trim)) +} else { + ResetCommand(None) +} --- End diff -- You can use `map` to shorten it to a single line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18268: [SPARK-21054] [SQL] Reset Command support reset specific...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18268 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18268: [SPARK-21054] [SQL] Reset Command support reset s...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18268#discussion_r121319194 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/SetCommand.scala --- @@ -149,13 +149,30 @@ object SetCommand { /** * This command is for resetting SQLConf to the default values. Command that runs * {{{ + * reset key; * reset; * }}} */ -case object ResetCommand extends RunnableCommand with Logging { +case class ResetCommand(key: Option[String]) extends RunnableCommand with Logging { + + private val runFunc: (SparkSession => Unit) = key match { + +case None => + val runFunc = (sparkSession: SparkSession) => { +sparkSession.sessionState.conf.clear() + } + runFunc + +// (In Hive, "RESET key" clear a specific property.) --- End diff -- No need to mention Hive here. Just need to explain the semantics. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18238: [SPARK-21016][core]Improve code fault tolerance for conv...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18238 **[Test build #77917 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77917/testReport)** for PR 18238 at commit [`04307c6`](https://github.com/apache/spark/commit/04307c611811d2cc207793488f004d74eb2c6b25). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18238: [SPARK-21016][core]Improve code fault tolerance for conv...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18238 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18238: [SPARK-21016][core]Improve code fault tolerance for conv...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18238 Although it is not a common user error, it does not hurt to add an extra `trim`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18260 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77912/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18260 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18260 **[Test build #77912 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77912/testReport)** for PR 18260 at commit [`e6e60e0`](https://github.com/apache/spark/commit/e6e60e0905dbc8693d840bf2c5e901488a97). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18258: [SPARK-21051][SQL] Add hash map metrics to aggreg...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18258#discussion_r121317516 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala --- @@ -313,7 +316,7 @@ case class HashAggregateExec( TaskContext.get().taskMemoryManager(), 1024 * 16, // initial capacity TaskContext.get().taskMemoryManager().pageSizeBytes, - false // disable tracking of performance metrics + true // tracking of performance metrics --- End diff -- Yeah, based on the benchmark, seems the performance degradation is not an issue. We can completely remove this parameter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18272: [DOCS] Fix error: ambiguous reference to overloaded defi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18272 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18272: [DOCS] Fix error: ambiguous reference to overload...
GitHub user ZiyueHuang opened a pull request: https://github.com/apache/spark/pull/18272 [DOCS] Fix error: ambiguous reference to overloaded definition ## What changes were proposed in this pull request? `df.groupBy.count()` should be `df.groupBy().count()` , otherwise there is an error : ambiguous reference to overloaded definition, both method groupBy in class Dataset of type (col1: String, cols: String*) and method groupBy in class Dataset of type (cols: org.apache.spark.sql.Column*) ## How was this patch tested? ```scala val df = spark.readStream.schema(...).json(...) val dfCounts =df.groupBy().count() ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/ZiyueHuang/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18272.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18272 commit 67dd3c75bce72bd320124b3b1a7585af40d18fb2 Author: Ziyue Huang Date: 2017-06-12T06:10:09Z Fix error: ambiguous reference to overloaded definition, both method groupBy in class Dataset of type (col1: String, cols: String*) and (cols: org.apache.spark.sql.Column*) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18258: [SPARK-21051][SQL] Add hash map metrics to aggreg...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18258#discussion_r121317053 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala --- @@ -313,7 +316,7 @@ case class HashAggregateExec( TaskContext.get().taskMemoryManager(), 1024 * 16, // initial capacity TaskContext.get().taskMemoryManager().pageSizeBytes, - false // disable tracking of performance metrics + true // tracking of performance metrics --- End diff -- Always turn it on? If we decide to always turn it on, why we still keep this parm? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18265: [SPARK-21050][ML] Word2vec persistence overflow bug fix
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18265 **[Test build #77916 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77916/testReport)** for PR 18265 at commit [`6bcf66f`](https://github.com/apache/spark/commit/6bcf66f58c6333c1d0d965596ad59b49d8e9f28a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18265: [SPARK-21050][ML] Word2vec persistence overflow bug fix
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/18265 Yep, someone hit the bug! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18269: [SPARK-21056][SQL] Use at most one spark job to l...
Github user bbossy commented on a diff in the pull request: https://github.com/apache/spark/pull/18269#discussion_r121316195 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala --- @@ -248,60 +245,93 @@ object InMemoryFileIndex extends Logging { * @return all children of path that match the specified filter. */ private def listLeafFiles( - path: Path, + paths: Seq[Path], hadoopConf: Configuration, filter: PathFilter, - sessionOpt: Option[SparkSession]): Seq[FileStatus] = { -logTrace(s"Listing $path") -val fs = path.getFileSystem(hadoopConf) + sessionOpt: Option[SparkSession]): Seq[(Path, Seq[FileStatus])] = { +logTrace(s"Listing ${paths.mkString(", ")}") // [SPARK-17599] Prevent InMemoryFileIndex from failing if path doesn't exist // Note that statuses only include FileStatus for the files and dirs directly under path, // and does not include anything else recursively. -val statuses = try fs.listStatus(path) catch { - case _: FileNotFoundException => -logWarning(s"The directory $path was not found. Was it deleted very recently?") -Array.empty[FileStatus] +val statuses = paths.flatMap { path => + try { +val fs = path.getFileSystem(hadoopConf) --- End diff -- Thanks! Fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18258: [SPARK-21051][SQL] Add hash map metrics to aggreg...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18258#discussion_r121316076 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -103,13 +103,43 @@ object SQLMetrics { } /** + * Create a metric to report the average information (including min, med, max) like + * avg hashmap probe. Because `SQLMetric` stores long values, we take the ceil of the average + * values before storing them. This metric is used to record an average value computed in the + * end of a task. It should be set once. The initial values (zeros) of this metrics will be + * excluded after. + */ + def createAverageMetric(sc: SparkContext, name: String): SQLMetric = { +// The final result of this metric in physical operator UI may looks like: +// probe avg (min, med, max): +// (1, 6, 2) --- End diff -- oh. right. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18258: [SPARK-21051][SQL] Add hash map metrics to aggreg...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18258#discussion_r121315825 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -103,13 +103,43 @@ object SQLMetrics { } /** + * Create a metric to report the average information (including min, med, max) like + * avg hashmap probe. Because `SQLMetric` stores long values, we take the ceil of the average + * values before storing them. This metric is used to record an average value computed in the + * end of a task. It should be set once. The initial values (zeros) of this metrics will be + * excluded after. + */ + def createAverageMetric(sc: SparkContext, name: String): SQLMetric = { +// The final result of this metric in physical operator UI may looks like: +// probe avg (min, med, max): +// (1, 6, 2) --- End diff -- med is medium? why 6? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18265: [SPARK-21050][ML] Word2vec persistence overflow b...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18265#discussion_r121315755 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala --- @@ -188,6 +188,15 @@ class Word2VecSuite extends SparkFunSuite with MLlibTestSparkContext with Defaul assert(math.abs(similarity(5) - similarityLarger(5) / similarity(5)) > 1E-5) } + test("Word2Vec read/write numPartitions calculation") { --- End diff -- Good point; I'll do that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18265: [SPARK-21050][ML] Word2vec persistence overflow b...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18265#discussion_r121315677 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala --- @@ -188,6 +188,15 @@ class Word2VecSuite extends SparkFunSuite with MLlibTestSparkContext with Defaul assert(math.abs(similarity(5) - similarityLarger(5) / similarity(5)) > 1E-5) } + test("Word2Vec read/write numPartitions calculation") { +val tinyModelNumPartitions = Word2VecModel.Word2VecModelWriter.calculateNumberOfPartitions( + sc, numWords = 10, vectorSize = 5) +assert(tinyModelNumPartitions === 1) +val mediumModelNumPartitions = Word2VecModel.Word2VecModelWriter.calculateNumberOfPartitions( + sc, numWords = 100, vectorSize = 5000) +assert(mediumModelNumPartitions > 1) --- End diff -- The "medium" one did cause an overflow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18265: [SPARK-21050][ML] Word2vec persistence overflow b...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/18265#discussion_r121315648 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala --- @@ -355,9 +364,12 @@ object Word2VecModel extends MLReadable[Word2VecModel] { // Calculate the approximate size of the model. // Assuming an average word size of 15 bytes, the formula is: // (floatSize * vectorSize + 15) * numWords - val numWords = instance.wordVectors.wordIndex.size - val approximateSizeInBytes = (floatSize * instance.getVectorSize + averageWordSize) * numWords - ((approximateSizeInBytes / bufferSizeInBytes) + 1).toInt + val approximateSizeInBytes = (floatSize * vectorSize + averageWordSize) * numWords + val numPartitions = (approximateSizeInBytes / bufferSizeInBytes) + 1 + require(numPartitions < 10e8, s"Word2VecModel calculated that it needs $numPartitions " + --- End diff -- I'm pretty sure it is necessary. If we cap it at Int.MAX and the user hits that cap, then it means that we'll fail when trying to write the partitions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18257: [SPARK-21041][SQL] SparkSession.range should be consiste...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18257 **[Test build #77915 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77915/testReport)** for PR 18257 at commit [`46f60f0`](https://github.com/apache/spark/commit/46f60f0fd9981b52de5b4c719ce51d0de9a97805). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18236: [SPARK-21015] Check field name is not null and empty in ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18236 How about just updating the comment of this function to explain the behavior we have now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/17758 ok, I'll also check again. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18266: [SPARK-20427][SQL] Read JDBC table use custom schema
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18266 **[Test build #77908 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77908/testReport)** for PR 18266 at commit [`0444c4d`](https://github.com/apache/spark/commit/0444c4d25d5943408a3ea11f84395dd38246e2f8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18266: [SPARK-20427][SQL] Read JDBC table use custom schema
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18266 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77908/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17645: [SPARK-20348] [ML] Support squared hinge loss (L2 loss) ...
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17645 OK. I'll close it for now and try to merge it with https://github.com/apache/spark/pull/17862. Thanks for the comment from @yanboliang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18266: [SPARK-20427][SQL] Read JDBC table use custom schema
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18266 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17645: [SPARK-20348] [ML] Support squared hinge loss (L2...
Github user hhbyyh closed the pull request at: https://github.com/apache/spark/pull/17645 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18257: [SPARK-21041][SQL] SparkSession.range should be c...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/18257#discussion_r121313742 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala --- @@ -191,6 +191,17 @@ class DataFrameRangeSuite extends QueryTest with SharedSQLContext with Eventuall checkAnswer(sql("SELECT * FROM range(3)"), Row(0) :: Row(1) :: Row(2) :: Nil) } } + + test("SPARK-21041 SparkSession.range()'s behavior is inconsistent with SparkContext.range()") { +val start = java.lang.Long.MAX_VALUE - 3 +val end = java.lang.Long.MIN_VALUE + 2 +Seq("false", "true").foreach { value => + withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> value) { +assert(spark.sparkContext.range(start, end, 1).collect.length == 0) +assert(spark.range(start, end, 1).collect.length == 0) --- End diff -- Sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18266: [SPARK-20427][SQL] Read JDBC table use custom sch...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18266#discussion_r121313654 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala --- @@ -273,6 +273,9 @@ class MetadataBuilder { /** Puts a [[Metadata]] array. */ def putMetadataArray(key: String, value: Array[Metadata]): this.type = put(key, value) + /** Puts a name. */ + def putName(name: String): this.type = put("name", name) --- End diff -- This interface change is not desired. See the PR https://github.com/apache/spark/pull/16209 You can further enhance our parser by supporting the data types that are not natively supported by Spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17251 **[Test build #77914 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77914/testReport)** for PR 17251 at commit [`08596f5`](https://github.com/apache/spark/commit/08596f54e62b26c4411207912121d6a14bdb0133). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17251 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/17758 Hi @maropu , I just did some simple search, and found many other places also related to duplicate columns. e.g. `InsertIntoHadoopFsRelationCommand`, `PartitioningUtils.normalizePartitionSpec`, `SessionCatalog.alterTableSchema`. Can you do a more comprehensive search to find if there are other places missed? Let's make all of them consistent as best as we can. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17583: [SPARK-20271]Add FuncTransformer to simplify cust...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17583#discussion_r121312651 --- Diff: mllib/src/main/scala/org/apache/spark/ml/FuncTransformer.scala --- @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml + +import java.io.{ByteArrayInputStream, ByteArrayOutputStream, ObjectInputStream, ObjectOutputStream} + +import scala.reflect.runtime.universe.{typeOf, TypeTag} + +import org.apache.hadoop.fs.Path + +import org.apache.spark.annotation.{DeveloperApi, Since} +import org.apache.spark.ml.FuncTransformer.FuncTransformerWriter +import org.apache.spark.ml.util._ +import org.apache.spark.sql.Row +import org.apache.spark.sql.catalyst.parser.CatalystSqlParser +import org.apache.spark.sql.types.DataType + +/** + * :: DeveloperApi :: + * A wrapper to allow easily creation of simple data manipulation for DataFrame. + * Note that FuncTransformer supports serialization via scala ObjectOutputStream and may not + * guarantee save/load compatibility between different scala version. + */ +@DeveloperApi +@Since("2.3.0") +class FuncTransformer [IN, OUT: TypeTag] @Since("2.3.0") ( +@Since("2.3.0") override val uid: String, +@Since("2.3.0") val func: IN => OUT, +@Since("2.3.0") val outputDataType: DataType + ) extends UnaryTransformer[IN, OUT, FuncTransformer[IN, OUT]] with DefaultParamsWritable { + + @Since("2.3.0") + def this(fx: IN => OUT, outputDataType: DataType) = +this(Identifiable.randomUID("FuncTransformer"), fx, outputDataType) + + @Since("2.3.0") + def this(fx: IN => OUT) = +this(Identifiable.randomUID("FuncTransformer"), fx, + CatalystSqlParser.parseDataType(typeOf[OUT].typeSymbol.name.decodedName.toString)) --- End diff -- Thanks Nick, updated with the exception message and we use the same type infer code as in createDataFrame. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18082: [SPARK-20665][SQL][FOLLOW-UP]Move test case to Ma...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18082 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17583: [SPARK-20271]Add FuncTransformer to simplify cust...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17583#discussion_r121312459 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/FuncTransformer.scala --- @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.feature + +import java.io.{ByteArrayInputStream, ByteArrayOutputStream, ObjectInputStream, ObjectOutputStream} + +import scala.reflect.runtime.universe.TypeTag + +import org.apache.hadoop.fs.Path + +import org.apache.spark.annotation.{DeveloperApi, Since} +import org.apache.spark.ml.UnaryTransformer +import org.apache.spark.ml.feature.FuncTransformer.FuncTransformerWriter +import org.apache.spark.ml.param.ParamMap +import org.apache.spark.ml.util._ +import org.apache.spark.sql.Row +import org.apache.spark.sql.catalyst.ScalaReflection +import org.apache.spark.sql.types.DataType + +/** + * :: DeveloperApi :: + * FuncTransformer allows easily creation of a custom feature transformer for DataFrame, such like + * conditional conversion(if...else...), type conversion, array indexing and many string ops. + * Note that FuncTransformer supports serialization via scala ObjectOutputStream and may not + * guarantee save/load compatibility between different scala version. + */ +@DeveloperApi +@Since("2.3.0") +class FuncTransformer [IN: TypeTag, OUT: TypeTag] @Since("2.3.0") ( +@Since("2.3.0") override val uid: String, +@Since("2.3.0") val func: IN => OUT, +@Since("2.3.0") val outputDataType: DataType + ) extends UnaryTransformer[IN, OUT, FuncTransformer[IN, OUT]] with DefaultParamsWritable { + + /** + * Create a FuncTransformer with specific function and output data type. + * @param fx function which converts an input object to output object. + * @param outputDataType specific output data type + */ + @Since("2.3.0") + def this(fx: IN => OUT, outputDataType: DataType) = +this(Identifiable.randomUID("FuncTransformer"), fx, outputDataType) + + /** + * Create a FuncTransformer with specific function and automatically infer the output data type. + * If the output data type cannot be automatically inferred, an exception will be thrown. + * @param fx function which converts an input object to output object. + */ + @Since("2.3.0") + def this(fx: IN => OUT) = this(Identifiable.randomUID("FuncTransformer"), fx, +try { + ScalaReflection.schemaFor[OUT].dataType +} catch { + case _: UnsupportedOperationException => throw new UnsupportedOperationException( +s"FuncTransformer outputDataType cannot be automatically inferred, please try" + + s" the constructor with specific outputDataType") +} + ) + + setDefault(inputCol -> "input", outputCol -> "output") + + @Since("2.3.0") + override def createTransformFunc: IN => OUT = func + + @Since("2.3.0") + override def write: MLWriter = new FuncTransformerWriter( +this.asInstanceOf[FuncTransformer[Nothing, Nothing]]) + + @Since("2.3.0") + override def copy(extra: ParamMap): FuncTransformer[IN, OUT] = { +copyValues(new FuncTransformer(uid, func, outputDataType), extra) + } + + override protected def validateInputType(inputType: DataType): Unit = { +try { + val funcINType = ScalaReflection.schemaFor[IN].dataType + require(inputType.equals(funcINType), +s"$uid only accept input type $funcINType but got $inputType.") +} catch { + case _: UnsupportedOperationException => +// cannot infer the output data type, log warning but do not block transform +logWarning(s"FuncTransformer input Type cannot be automatically inferred," + + s"Type check omitted for $uid") +} + } +} + +/** + * :: DeveloperApi :: + * Companion object for FuncTransformer with save a
[GitHub] spark pull request #18257: [SPARK-21041][SQL] SparkSession.range should be c...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18257#discussion_r121312457 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala --- @@ -191,6 +191,17 @@ class DataFrameRangeSuite extends QueryTest with SharedSQLContext with Eventuall checkAnswer(sql("SELECT * FROM range(3)"), Row(0) :: Row(1) :: Row(2) :: Nil) } } + + test("SPARK-21041 SparkSession.range()'s behavior is inconsistent with SparkContext.range()") { +val start = java.lang.Long.MAX_VALUE - 3 +val end = java.lang.Long.MIN_VALUE + 2 +Seq("false", "true").foreach { value => + withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> value) { +assert(spark.sparkContext.range(start, end, 1).collect.length == 0) +assert(spark.range(start, end, 1).collect.length == 0) --- End diff -- Shall we also test the case `start == end`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18230: [SPARK-19688] [STREAMING] Not to read `spark.yarn.creden...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18230 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18230: [SPARK-19688] [STREAMING] Not to read `spark.yarn.creden...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18230 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77910/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18082: [SPARK-20665][SQL][FOLLOW-UP]Move test case to MathExpre...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18082 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18230: [SPARK-19688] [STREAMING] Not to read `spark.yarn.creden...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18230 **[Test build #77910 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77910/testReport)** for PR 18230 at commit [`2c50858`](https://github.com/apache/spark/commit/2c50858be9fd422d32d5648b651a4b26ba3f8728). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18260 LGTM except for two comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18270 **[Test build #77913 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77913/testReport)** for PR 18270 at commit [`f532d9f`](https://github.com/apache/spark/commit/f532d9ff2b6bd8722ce215ed1c372bd991193a0f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18260: [SPARK-21046][SQL] simplify the array offset and ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18260#discussion_r121311872 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java --- @@ -518,19 +519,13 @@ private void throwUnsupportedException(int requiredCapacity, Throwable cause) { public abstract double getDouble(int rowId); /** - * Puts a byte array that already exists in this column. - */ - public abstract void putArray(int rowId, int offset, int length); - - /** - * Returns the length of the array at rowid. + * After writing array elements to the child column vector, call this method to set the offset and + * size of the written array. */ - public abstract int getArrayLength(int rowId); - - /** - * Returns the offset of the array at rowid. - */ - public abstract int getArrayOffset(int rowId); + public void putArrayOffsetAndSize(int rowId, int offset, int size) { +long offsetAndSize = (offset << 32) | size; --- End diff -- `offset` should be converted to `long` before shifting? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18257: [SPARK-21041][SQL] SparkSession.range should be consiste...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18257 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18257: [SPARK-21041][SQL] SparkSession.range should be consiste...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18257 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77906/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18257: [SPARK-21041][SQL] SparkSession.range should be consiste...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18257 **[Test build #77906 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77906/testReport)** for PR 18257 at commit [`89dd7ad`](https://github.com/apache/spark/commit/89dd7ada850c1fb02fba32bc955f2de3a7ae3679). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18260: [SPARK-21046][SQL] simplify the array offset and ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18260#discussion_r121311563 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java --- @@ -43,14 +43,12 @@ private byte[] byteData; private short[] shortData; private int[] intData; + // This is not only used to store data for int column vector, but also can store offsets and --- End diff -- int column vector -> long column vector. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18271: [MINOR][DOCS] Improve Running R Tests docs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18271 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18271: [MINOR][DOCS] Improve Running R Tests docs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18271 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77911/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18271: [MINOR][DOCS] Improve Running R Tests docs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18271 **[Test build #77911 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77911/testReport)** for PR 18271 at commit [`d715ae8`](https://github.com/apache/spark/commit/d715ae89fba24bb56a2d2ca7fd0e0c1d438851af). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18260 @viirya @kiszk good catch! fixed by using long to store offset and size. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18260 **[Test build #77912 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77912/testReport)** for PR 18260 at commit [`e6e60e0`](https://github.com/apache/spark/commit/e6e60e0905dbc8693d840bf2c5e901488a97). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18271: [MINOR][DOCS] Improve Running R Tests docs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18271 **[Test build #77911 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77911/testReport)** for PR 18271 at commit [`d715ae8`](https://github.com/apache/spark/commit/d715ae89fba24bb56a2d2ca7fd0e0c1d438851af). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18128: [SPARK-20906][SparkR]:Constrained Logistic Regression fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18128 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18128: [SPARK-20906][SparkR]:Constrained Logistic Regression fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18128 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77909/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18128: [SPARK-20906][SparkR]:Constrained Logistic Regression fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18128 **[Test build #77909 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77909/testReport)** for PR 18128 at commit [`c3190b5`](https://github.com/apache/spark/commit/c3190b5b4701afeceec17bbaa7c4ef6f0239b2c8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18260 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77905/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18260 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18260 **[Test build #77905 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77905/testReport)** for PR 18260 at commit [`1dae660`](https://github.com/apache/spark/commit/1dae6604c0613a0b9e2a2a0dbc53a709cc232d09). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18260 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18260 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77900/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18260 **[Test build #77900 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77900/testReport)** for PR 18260 at commit [`a61ba71`](https://github.com/apache/spark/commit/a61ba71ec6bf8245fcce423958d83cbc86b27adc). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18082: [SPARK-20665][SQL][FOLLOW-UP]Move test case to MathExpre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18082 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18082: [SPARK-20665][SQL][FOLLOW-UP]Move test case to MathExpre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18082 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77904/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18082: [SPARK-20665][SQL][FOLLOW-UP]Move test case to MathExpre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18082 **[Test build #77904 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77904/testReport)** for PR 18082 at commit [`4e20839`](https://github.com/apache/spark/commit/4e20839de8b67645252338006dc411bbc0c31173). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18260: [SPARK-21046][SQL] simplify the array offset and ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18260#discussion_r121307879 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java --- @@ -42,15 +42,13 @@ // Array for each type. Only 1 is populated for any type. private byte[] byteData; private short[] shortData; + // This is not only used to store data for int column vector, but also can store offsets and + // lengths for array column vector. private int[] intData; --- End diff -- @kiszk Do you meant we store a pair of offset/length together as an element in `longData`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18230: [SPARK-19688] [STREAMING] Not to read `spark.yarn.creden...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18230 **[Test build #77910 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77910/testReport)** for PR 18230 at commit [`2c50858`](https://github.com/apache/spark/commit/2c50858be9fd422d32d5648b651a4b26ba3f8728). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18260: [SPARK-21046][SQL] simplify the array offset and ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18260#discussion_r121307570 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java --- @@ -366,55 +364,22 @@ public double getDouble(int rowId) { } } - // - // APIs dealing with Arrays - // - - @Override - public int getArrayLength(int rowId) { -return arrayLengths[rowId]; - } - @Override - public int getArrayOffset(int rowId) { -return arrayOffsets[rowId]; - } - - @Override - public void putArray(int rowId, int offset, int length) { -arrayOffsets[rowId] = offset; -arrayLengths[rowId] = length; - } - @Override public void loadBytes(ColumnVector.Array array) { array.byteArray = byteData; array.byteArrayOffset = array.offset; } - // - // APIs dealing with Byte Arrays - // - - @Override - public int putByteArray(int rowId, byte[] value, int offset, int length) { -int result = arrayData().appendBytes(length, value, offset); -arrayOffsets[rowId] = result; -arrayLengths[rowId] = length; -return result; - } - // Spilt this function out since it is the slow path. @Override protected void reserveInternal(int newCapacity) { if (this.resultArray != null || DecimalType.isByteArrayDecimalType(type)) { - int[] newLengths = new int[newCapacity]; - int[] newOffsets = new int[newCapacity]; - if (this.arrayLengths != null) { -System.arraycopy(this.arrayLengths, 0, newLengths, 0, capacity); -System.arraycopy(this.arrayOffsets, 0, newOffsets, 0, capacity); + // need 2 ints as offset and length for each array. + if (intData == null || intData.length < newCapacity * 2) { +int[] newData = new int[newCapacity * 2]; --- End diff -- `newCapacity` here can be `MAX_CAPACITY` at most. When `newCapacity` is more than `MAX_CAPACITY / 2`, seems this allocation would cause problem? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18230: [SPARK-19688] [STREAMING] Not to read `spark.yarn.creden...
Github user saturday-shi commented on the issue: https://github.com/apache/spark/pull/18230 No, I don't mean to insist on my opinion. I'm just curious to know the reason for the changing (as it looks like another point fix). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18260 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18260 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77903/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18260: [SPARK-21046][SQL] simplify the array offset and ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18260#discussion_r121306993 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java --- @@ -42,15 +42,13 @@ // Array for each type. Only 1 is populated for any type. private byte[] byteData; private short[] shortData; + // This is not only used to store data for int column vector, but also can store offsets and + // lengths for array column vector. private int[] intData; --- End diff -- Oh. I see. We only check the limit of `MAX_CAPACITY` before actually going into `reserveInternal`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18260: [SPARK-21046][SQL] simplify the array offset and ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/18260#discussion_r121306979 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java --- @@ -42,15 +42,13 @@ // Array for each type. Only 1 is populated for any type. private byte[] byteData; private short[] shortData; + // This is not only used to store data for int column vector, but also can store offsets and + // lengths for array column vector. private int[] intData; --- End diff -- Good catch. Is it possible to use `longData`, which has a pair of 32-bit offset and length, to keep `MAX_CAPACITY` array length? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18260 **[Test build #77903 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77903/testReport)** for PR 18260 at commit [`368c346`](https://github.com/apache/spark/commit/368c3462667bdb6822be01ecc95dfcdb04ce747a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18260: [SPARK-21046][SQL] simplify the array offset and ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18260#discussion_r121306584 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java --- @@ -42,15 +42,13 @@ // Array for each type. Only 1 is populated for any type. private byte[] byteData; private short[] shortData; + // This is not only used to store data for int column vector, but also can store offsets and + // lengths for array column vector. private int[] intData; --- End diff -- One question I have is, the capacity of `ColumnVector` is bound by `MAX_CAPACITY`. Previously we store offset and length individually, so we can have `MAX_CAPACITY` arrays at most. Now we store offset and length together in data/intData which is bound to `MAX_CAPACITY`, doesn't it say we can just have `MAX_CAPACITY / 2` arrays at most? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18128: [SPARK-20906][SparkR]:Constrained Logistic Regression fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18128 **[Test build #77909 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77909/testReport)** for PR 18128 at commit [`c3190b5`](https://github.com/apache/spark/commit/c3190b5b4701afeceec17bbaa7c4ef6f0239b2c8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18266: [SPARK-20427][SQL] Read JDBC table use custom schema
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18266 **[Test build #77908 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77908/testReport)** for PR 18266 at commit [`0444c4d`](https://github.com/apache/spark/commit/0444c4d25d5943408a3ea11f84395dd38246e2f8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18260 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18271: [MINOR][DOCS] Improve Running R Tests docs
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18271#discussion_r121305976 --- Diff: docs/building-spark.md --- @@ -219,7 +219,7 @@ The run-tests script also can be limited to a specific Python version or a speci ## Running R Tests To run the SparkR tests you will need to install the R package `testthat` -(run `install.packages(testthat)` from R shell). You can run just the SparkR tests using +(run `install.packages("testthat")` from R shell). You can run just the SparkR tests using --- End diff -- Mind updating this contents to be consistent with https://github.com/apache/spark/blob/7e0cd1d9b168286386f15e9b55988733476ae2bb/R/README.md#examples-unit-tests if it sounds making sense to you? This also looks including specifying the mirror from where to download the package. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18266: [SPARK-20427][SQL] Read JDBC table use custom schema
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/18266 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18271: [MINOR][DOCS] Improve docs to Running R Tests
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18271 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18271: [MINOR][DOCS] Improve docs to Running R Tests
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18271 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77907/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18271: [MINOR][DOCS] Improve docs to Running R Tests
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18271 **[Test build #77907 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77907/testReport)** for PR 18271 at commit [`5643ab9`](https://github.com/apache/spark/commit/5643ab9e652b3d20c335a6d4e7545da0f115d774). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18260 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77902/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18260 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18260 **[Test build #77902 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77902/testReport)** for PR 18260 at commit [`2b78043`](https://github.com/apache/spark/commit/2b780432173cc3e2027117e446d66899a097fe67). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18266: [SPARK-20427][SQL] Read JDBC table use custom schema
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/18266 retest please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18199 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77901/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18199 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18199 **[Test build #77901 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77901/testReport)** for PR 18199 at commit [`1d8454d`](https://github.com/apache/spark/commit/1d8454db916853e783792411704f7a49314ee9eb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18271: [MINOR][DOCS] Improve docs to Running R Tests
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18271 **[Test build #77907 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77907/testReport)** for PR 18271 at commit [`5643ab9`](https://github.com/apache/spark/commit/5643ab9e652b3d20c335a6d4e7545da0f115d774). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18271: [MINOR][DOCS] Improve docs to Running R Tests
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/18271 [MINOR][DOCS] Improve docs to Running R Tests ## What changes were proposed in this pull request? `install.packages(testthat)` should be `install.packages("testthat")`, otherwise: ``` > install.packages(testthat) Error in install.packages(testthat) : object 'testthat' not found ``` ## How was this patch tested? ``` > install.packages("testthat") Installing package into â/usr/lib64/R/libraryâ (as âlibâ is unspecified) trying URL 'https://mirror.lzu.edu.cn/CRAN/src/contrib/testthat_1.0.2.tar.gz' ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark building-spark Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18271.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18271 commit 5643ab9e652b3d20c335a6d4e7545da0f115d774 Author: Yuming Wang Date: 2017-06-12T02:54:11Z Improve docs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18260: [SPARK-21046][SQL] simplify the array offset and ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18260#discussion_r121303857 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java --- @@ -438,10 +401,8 @@ public void loadBytes(ColumnVector.Array array) { protected void reserveInternal(int newCapacity) { int oldCapacity = (this.data == 0L) ? 0 : capacity; if (this.resultArray != null) { - this.lengthData = - Platform.reallocateMemory(lengthData, oldCapacity * 4, newCapacity * 4); - this.offsetData = - Platform.reallocateMemory(offsetData, oldCapacity * 4, newCapacity * 4); + // need 2 ints as offset and length for each array. --- End diff -- Oh. This's quite ambiguous. Nvm. `for each array` is good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org