[GitHub] spark pull request #16603: [SPARK-19244][Core] Sort MemoryConsumers accordin...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16603#discussion_r96581689 --- Diff: core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java --- @@ -144,8 +164,24 @@ public long acquireExecutionMemory(long required, MemoryConsumer consumer) { // spilling, avoid to have too many spilled files. if (got < required) { // Call spill() on other consumers to release memory +// Sort the consumers according their memory usage. So we avoid spilling the same consumer +// which is just spilled in last few times and re-spilling on it will produce many small +// spill files. +List sortedList = new ArrayList<>(consumers.size()); for (MemoryConsumer c: consumers) { if (c != consumer && c.getUsed() > 0 && c.getMode() == mode) { +sortedList.add(c); + } +} +Collections.sort(sortedList, new ConsumerComparator()); +for (int listIndex = 0; listIndex < sortedList.size(); listIndex++) { + MemoryConsumer c = sortedList.get(listIndex); + // Try to only spill on the consumer which has the required size of memory. + // As the consumers are sorted in descending order, if the next consumer doesn't have + // the required memory, then we need to spill the current consumer at least. + boolean doSpill = (listIndex + 1) == sortedList.size() || +sortedList.get(listIndex + 1).getUsed() < (required - got); + if (doSpill) { --- End diff -- I like the fact that this implementation does not need to incur the cost of remove in a TreeMap. Unfortunately, I dont think it is sufficient : the impl assumes that spill() will actually always give you back getUsed - from the rest of the code in the method, this does not look like a valid assumption to make. This can resulting in spilling a large number of smaller blocks, and potentially itself. For example: required = 500MB, consumers = 1.5GB 1GB 500MB 2MB 1MB .. If spilling 500MB resulted in (say) releasing 490MB, we might end up spilling a large number of blocks and also (potentially) end up spilling itself - also can end up returning less than requested while enough memory does exist to satisfy the request. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16613: [SPARK-19024][SQL] Implement new approach to writ...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/16613#discussion_r96580856 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala --- @@ -275,21 +286,80 @@ case class AlterViewAsCommand( throw new AnalysisException(s"${viewMeta.identifier} is not a view.") } -val viewSQL: String = new SQLBuilder(analyzedPlan).toSQL -// Validate the view SQL - make sure we can parse it and analyze it. -// If we cannot analyze the generated query, there is probably a bug in SQL generation. -try { - session.sql(viewSQL).queryExecution.assertAnalyzed() -} catch { - case NonFatal(e) => -throw new RuntimeException(s"Failed to analyze the canonicalized SQL: $viewSQL", e) -} +val newProperties = generateViewProperties(viewMeta.properties, session, analyzedPlan) val updatedViewMeta = viewMeta.copy( schema = analyzedPlan.schema, + properties = newProperties, viewOriginalText = Some(originalText), - viewText = Some(viewSQL)) + viewText = Some(originalText)) session.sessionState.catalog.alterTable(updatedViewMeta) } } + +object ViewHelper { + + import CatalogTable._ + + /** + * Generate the view default database in `properties`. + */ + def generateViewDefaultDatabase(databaseName: String): Map[String, String] = { +Map(VIEW_DEFAULT_DATABASE -> databaseName) + } + + /** + * Generate the view query output column names in `properties`. + */ + def generateQueryColumnNames(columns: Seq[String]): Map[String, String] = { +val props = new mutable.HashMap[String, String] +if (columns.nonEmpty) { + props.put(VIEW_QUERY_OUTPUT_NUM_COLUMNS, columns.length.toString) + columns.zipWithIndex.foreach { case (colName, index) => +props.put(s"$VIEW_QUERY_OUTPUT_COLUMN_NAME_PREFIX$index", colName) + } +} +props.toMap + } + + /** + * Remove the view query output column names in `properties`. + */ + def removeQueryColumnNames(properties: Map[String, String]): Map[String, String] = { +// We can't use `filterKeys` here, as the map returned by `filterKeys` is not serializable, +// while `CatalogTable` should be serializable. +properties.filterNot { case (key, _) => + key.startsWith(VIEW_QUERY_OUTPUT_PREFIX) +} + } + + /** + * Generate the view properties in CatalogTable, including: + * 1. view default database that is used to provide the default database name on view resolution. + * 2. the output column names of the query that creates a view, this is used to map the output of + *the view child to the view output during view resolution. + * + * @param properties the `properties` in CatalogTable. + * @param session the spark session. + * @param analyzedPlan the analyzed logical plan that represents the child of a view. + * @return new view properties including view default database and query column names properties. + */ + def generateViewProperties( --- End diff -- yea, will update that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema order a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16606 **[Test build #71580 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71580/testReport)** for PR 16606 at commit [`5e60f14`](https://github.com/apache/spark/commit/5e60f1417f6b85e2f4fbab86d6b506d0cc2a553b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16613: [SPARK-19024][SQL] Implement new approach to writ...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/16613#discussion_r96580761 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala --- @@ -207,29 +210,35 @@ case class CreateViewCommand( } /** - * Returns a [[CatalogTable]] that can be used to save in the catalog. This comment canonicalize - * SQL based on the analyzed plan, and also creates the proper schema for the view. + * Returns a [[CatalogTable]] that can be used to save in the catalog. Generate the view-specific + * properties(e.g. view default database, view query output column names) and store them as + * properties in the CatalogTable, and also creates the proper schema for the view. + * + * @param session the spark session. + * @param aliasedPlan if `userSpecifiedColumns` is defined, the aliased plan outputs the user + *specified columns, else it is the same as the `analyzedPlan`. + * @param analyzedPlan the analyzed logical plan that represents the child of a view. --- End diff -- We generate the `queryColumnNames` by `analyzedPlan`, and we generate the view schema by `aliasedPlan`, they are not the same when `userSpecifiedColumns` is defined. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16613: [SPARK-19024][SQL] Implement new approach to writ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16613#discussion_r96580350 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala --- @@ -275,21 +286,80 @@ case class AlterViewAsCommand( throw new AnalysisException(s"${viewMeta.identifier} is not a view.") } -val viewSQL: String = new SQLBuilder(analyzedPlan).toSQL -// Validate the view SQL - make sure we can parse it and analyze it. -// If we cannot analyze the generated query, there is probably a bug in SQL generation. -try { - session.sql(viewSQL).queryExecution.assertAnalyzed() -} catch { - case NonFatal(e) => -throw new RuntimeException(s"Failed to analyze the canonicalized SQL: $viewSQL", e) -} +val newProperties = generateViewProperties(viewMeta.properties, session, analyzedPlan) val updatedViewMeta = viewMeta.copy( schema = analyzedPlan.schema, + properties = newProperties, viewOriginalText = Some(originalText), - viewText = Some(viewSQL)) + viewText = Some(originalText)) session.sessionState.catalog.alterTable(updatedViewMeta) } } + +object ViewHelper { + + import CatalogTable._ + + /** + * Generate the view default database in `properties`. + */ + def generateViewDefaultDatabase(databaseName: String): Map[String, String] = { +Map(VIEW_DEFAULT_DATABASE -> databaseName) + } + + /** + * Generate the view query output column names in `properties`. + */ + def generateQueryColumnNames(columns: Seq[String]): Map[String, String] = { +val props = new mutable.HashMap[String, String] +if (columns.nonEmpty) { + props.put(VIEW_QUERY_OUTPUT_NUM_COLUMNS, columns.length.toString) + columns.zipWithIndex.foreach { case (colName, index) => +props.put(s"$VIEW_QUERY_OUTPUT_COLUMN_NAME_PREFIX$index", colName) + } +} +props.toMap + } + + /** + * Remove the view query output column names in `properties`. + */ + def removeQueryColumnNames(properties: Map[String, String]): Map[String, String] = { +// We can't use `filterKeys` here, as the map returned by `filterKeys` is not serializable, +// while `CatalogTable` should be serializable. +properties.filterNot { case (key, _) => + key.startsWith(VIEW_QUERY_OUTPUT_PREFIX) +} + } + + /** + * Generate the view properties in CatalogTable, including: + * 1. view default database that is used to provide the default database name on view resolution. + * 2. the output column names of the query that creates a view, this is used to map the output of + *the view child to the view output during view resolution. + * + * @param properties the `properties` in CatalogTable. + * @param session the spark session. + * @param analyzedPlan the analyzed logical plan that represents the child of a view. + * @return new view properties including view default database and query column names properties. + */ + def generateViewProperties( --- End diff -- looks like all other methods in this class can be `private`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16613: [SPARK-19024][SQL] Implement new approach to writ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16613#discussion_r96580275 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala --- @@ -207,29 +210,35 @@ case class CreateViewCommand( } /** - * Returns a [[CatalogTable]] that can be used to save in the catalog. This comment canonicalize - * SQL based on the analyzed plan, and also creates the proper schema for the view. + * Returns a [[CatalogTable]] that can be used to save in the catalog. Generate the view-specific + * properties(e.g. view default database, view query output column names) and store them as + * properties in the CatalogTable, and also creates the proper schema for the view. + * + * @param session the spark session. + * @param aliasedPlan if `userSpecifiedColumns` is defined, the aliased plan outputs the user + *specified columns, else it is the same as the `analyzedPlan`. + * @param analyzedPlan the analyzed logical plan that represents the child of a view. --- End diff -- why we need both `aliasedPlan` and `analyzedPlan`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16613 **[Test build #71579 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71579/testReport)** for PR 16613 at commit [`2d49ef2`](https://github.com/apache/spark/commit/2d49ef26936448dd70768562c4ef429542f56e4e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16621 After merging https://github.com/apache/spark/pull/16517, it introduces a few conflicts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16517: [SPARK-18243][SQL] Port Hive writing to use FileF...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16517 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16517: [SPARK-18243][SQL] Port Hive writing to use FileFormat i...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16517 Thanks! Merged to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16624: [WIP] Add two test cases for `SET -v`.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16624 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71571/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16624: [WIP] Add two test cases for `SET -v`.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16624 **[Test build #71571 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71571/testReport)** for PR 16624 at commit [`87de8da`](https://github.com/apache/spark/commit/87de8da846a7b4d368c1475ba3fc3d83cc865220). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16624: [WIP] Add two test cases for `SET -v`.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16624 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11867: [SPARK-14049] [CORE] Add functionality in spark history ...
Github user paragpc commented on the issue: https://github.com/apache/spark/pull/11867 I am not sure why the build is failing with following error, stderr: fatal: unable to access 'https://github.com/apache/spark.git/': Failed connect to github.com:443; Operation now in progress at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1640) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1388) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$300(CliGitAPIImpl.java:62) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$1.execute(CliGitAPIImpl.java:313) at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:152) at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:145) at hudson.remoting.UserRequest.perform(UserRequest.java:120) at hudson.remoting.UserRequest.perform(UserRequest.java:48) at hudson.remoting.Request$2.run(Request.java:326) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) at ..remote call to amp-jenkins-worker-04(Native Method) at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1416) at hudson.remoting.UserResponse.retrieve(UserRequest.java:220) at hudson.remoting.Channel.call(Channel.java:781) at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:145) at sun.reflect.GeneratedMethodAccessor287.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:131) at com.sun.proxy.$Proxy58.execute(Unknown Source) at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:761) ... 11 more Error does not seem related to my changes, can anyone help? cc @vanzin, @zsxwing, @squito --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11867: [SPARK-14049] [CORE] Add functionality in spark history ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11867 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11867: [SPARK-14049] [CORE] Add functionality in spark history ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11867 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71578/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16517: [SPARK-18243][SQL] Port Hive writing to use FileFormat i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16517 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71569/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16517: [SPARK-18243][SQL] Port Hive writing to use FileFormat i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16517 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16517: [SPARK-18243][SQL] Port Hive writing to use FileFormat i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16517 **[Test build #71569 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71569/testReport)** for PR 16517 at commit [`150efa2`](https://github.com/apache/spark/commit/150efa2266f298205b272e9347032b2a85ab665c). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class HiveFileFormat(fileSinkConf: FileSinkDesc) extends FileFormat with Logging ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16630 The following code illustrates the idea of this PR. ``` val datasetWithWeight = Seq( (1.0, 1.0, 0.0, 5.0), (0.5, 2.0, 1.0, 2.0), (1.0, 3.0, 2.0, 1.0), (0.0, 4.0, 3.0, 3.0) ).toDF("y", "w", "x1", "x2") val formula = (new RFormula() .setFormula("y ~ x1 + x2") .setFeaturesCol("features") .setLabelCol("label")) val output = formula.fit(datasetWithWeight).transform(datasetWithWeight) val glr = new GeneralizedLinearRegression() val model = glr.fit(output) model.summary.summaryTable.show ``` This prints out: ``` +-++---+---+---+ | Feature|Estimate| StdError| TValue| PValue| +-++---+---+---+ |Intercept| 1.4523809523809539| 0.9245946589975053| 1.5708299180050451| 0.3609009059280113| | x1|-0.33387|0.28171808490950573|-1.1832159566199243|0.44669962096188565| | x2|-0.11904761904761924| 0.2129588548|-0.5590169943749482| 0.6754896416955616| +-++---+---+---+ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11867: [SPARK-14049] [CORE] Add functionality in spark history ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11867 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11867: [SPARK-14049] [CORE] Add functionality in spark history ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11867 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71577/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16630 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16605 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71570/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16605 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16605 **[Test build #71570 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71570/testReport)** for PR 16605 at commit [`c5d8070`](https://github.com/apache/spark/commit/c5d80701cc5429841534c980030f983e9e941e46). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...
GitHub user actuaryzhang opened a pull request: https://github.com/apache/spark/pull/16630 [SPARK-19270][ML] Add summary table to GLM summary ## What changes were proposed in this pull request? Add R-like summary table to GLM summary, which includes feature name (if exist), parameter estimate, standard error, t-stat and p-value. This allows scala users to easily gather these commonly used inference results. @srowen @yanboliang ## How was this patch tested? New tests. One for testing feature Name, and one for testing the summary Table. You can merge this pull request into a Git repository by running: $ git pull https://github.com/actuaryzhang/spark glmTable Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16630.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16630 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16621#discussion_r96577649 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -215,37 +215,43 @@ case class DataSourceAnalysis(conf: CatalystConf) extends Rule[LogicalPlan] { /** - * Replaces [[SimpleCatalogRelation]] with data source table if its table property contains data - * source information. + * Replaces [[SimpleCatalogRelation]] with data source table if its table provider is not hive. */ class FindDataSourceTable(sparkSession: SparkSession) extends Rule[LogicalPlan] { - private def readDataSourceTable( - sparkSession: SparkSession, - simpleCatalogRelation: SimpleCatalogRelation): LogicalPlan = { -val table = simpleCatalogRelation.catalogTable -val pathOption = table.storage.locationUri.map("path" -> _) -val dataSource = - DataSource( -sparkSession, -userSpecifiedSchema = Some(table.schema), -partitionColumns = table.partitionColumnNames, -bucketSpec = table.bucketSpec, -className = table.provider.get, -options = table.storage.properties ++ pathOption) - -LogicalRelation( - dataSource.resolveRelation(), - expectedOutputAttributes = Some(simpleCatalogRelation.output), - catalogTable = Some(table)) + private def readDataSourceTable(table: CatalogTable): LogicalPlan = { +val qualifiedTableName = QualifiedTableName(table.database, table.identifier.table) +val cache = sparkSession.sessionState.catalog.tableRelationCache +val withHiveSupport = + sparkSession.sparkContext.conf.get(StaticSQLConf.CATALOG_IMPLEMENTATION) == "hive" + +cache.get(qualifiedTableName, new Callable[LogicalPlan]() { + override def call(): LogicalPlan = { +val pathOption = table.storage.locationUri.map("path" -> _) +val dataSource = + DataSource( +sparkSession, +// In older version(prior to 2.1) of Spark, the table schema can be empty and should be +// inferred at runtime. We should still support it. +userSpecifiedSchema = if (table.schema.isEmpty) None else Some(table.schema), +partitionColumns = table.partitionColumnNames, +bucketSpec = table.bucketSpec, +className = table.provider.get, +options = table.storage.properties ++ pathOption, +// TODO: improve `InMemoryCatalog` and remove this limitation. +catalogTable = if (withHiveSupport) Some(table) else None) + +LogicalRelation(dataSource.resolveRelation(), catalogTable = Some(table)) --- End diff -- cc @wzhfy --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHolder for...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16585 LGTM, pending jenkins --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16621#discussion_r96577543 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala --- @@ -1322,4 +1322,26 @@ class MetastoreDataSourcesSuite extends QueryTest with SQLTestUtils with TestHiv sparkSession.sparkContext.conf.set(DEBUG_MODE, previousValue) } } + + test("SPARK-18464: support old table which doesn't store schema in table properties") { --- End diff -- this test was removed in https://github.com/apache/spark/pull/16003, but I find it's still useful and is not covered by other tests, so I add it back. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16621#discussion_r96577471 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -1626,17 +1626,6 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { assert(d.size == d.distinct.size) } - test("SPARK-17625: data source table in InMemoryCatalog should guarantee output consistency") { --- End diff -- we don't need this test anymore, see https://github.com/apache/spark/pull/16621/files#r96577427 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16621#discussion_r96577427 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -215,37 +215,43 @@ case class DataSourceAnalysis(conf: CatalystConf) extends Rule[LogicalPlan] { /** - * Replaces [[SimpleCatalogRelation]] with data source table if its table property contains data - * source information. + * Replaces [[SimpleCatalogRelation]] with data source table if its table provider is not hive. */ class FindDataSourceTable(sparkSession: SparkSession) extends Rule[LogicalPlan] { - private def readDataSourceTable( - sparkSession: SparkSession, - simpleCatalogRelation: SimpleCatalogRelation): LogicalPlan = { -val table = simpleCatalogRelation.catalogTable -val pathOption = table.storage.locationUri.map("path" -> _) -val dataSource = - DataSource( -sparkSession, -userSpecifiedSchema = Some(table.schema), -partitionColumns = table.partitionColumnNames, -bucketSpec = table.bucketSpec, -className = table.provider.get, -options = table.storage.properties ++ pathOption) - -LogicalRelation( - dataSource.resolveRelation(), - expectedOutputAttributes = Some(simpleCatalogRelation.output), - catalogTable = Some(table)) + private def readDataSourceTable(table: CatalogTable): LogicalPlan = { +val qualifiedTableName = QualifiedTableName(table.database, table.identifier.table) +val cache = sparkSession.sessionState.catalog.tableRelationCache +val withHiveSupport = + sparkSession.sparkContext.conf.get(StaticSQLConf.CATALOG_IMPLEMENTATION) == "hive" + +cache.get(qualifiedTableName, new Callable[LogicalPlan]() { + override def call(): LogicalPlan = { +val pathOption = table.storage.locationUri.map("path" -> _) +val dataSource = + DataSource( +sparkSession, +// In older version(prior to 2.1) of Spark, the table schema can be empty and should be +// inferred at runtime. We should still support it. +userSpecifiedSchema = if (table.schema.isEmpty) None else Some(table.schema), +partitionColumns = table.partitionColumnNames, +bucketSpec = table.bucketSpec, +className = table.provider.get, +options = table.storage.properties ++ pathOption, +// TODO: improve `InMemoryCatalog` and remove this limitation. +catalogTable = if (withHiveSupport) Some(table) else None) + +LogicalRelation(dataSource.resolveRelation(), catalogTable = Some(table)) --- End diff -- Note that, previously we will set `expectedOutputAttributes` here, which was added by https://github.com/apache/spark/pull/15182 However, this doesn't work when the table schema needs to be inferred at runtime, and it turns out that we don't need to do it at all. `AnalyzeColumnCommand` now gets attributes from the [resolved table relation plan](https://github.com/apache/spark/pull/16621/files#diff-027d6bd7c8cf4f64f99acc058389d859R44) , so it's fine for rule `FindDataSourceTable` to change outputs during analysis. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16605 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71568/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16605 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16605 **[Test build #71568 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71568/testReport)** for PR 16605 at commit [`22fb9d1`](https://github.com/apache/spark/commit/22fb9d14abcf7b2590c07739c2ce9641abb64ea5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16621#discussion_r96576872 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -586,12 +594,12 @@ class SessionCatalog( desc = metadata, output = metadata.schema.toAttributes, child = parser.parsePlan(viewText)) - SubqueryAlias(relationAlias, child, Option(name)) + SubqueryAlias(relationAlias, child, Some(name.copy(table = table, database = Some(db } else { SubqueryAlias(relationAlias, SimpleCatalogRelation(metadata), None) } } else { -SubqueryAlias(relationAlias, tempTables(table), Option(name)) +SubqueryAlias(relationAlias, tempTables(table), None) --- End diff -- the existing way is to set `None`, see https://github.com/apache/spark/pull/16621/files#diff-ca4533edbf148c89cc0c564ab6b0aeaaL75 This shows the evil of duplicated code, we have inconsistent behaviors without and without hive support. I think we should only set table identifier for persisted view, @hvanhovell is that true? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16621 **[Test build #71576 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71576/testReport)** for PR 16621 at commit [`d636389`](https://github.com/apache/spark/commit/d636389947af3041832c63582f9073b92421d7f0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16517: [SPARK-18243][SQL] Port Hive writing to use FileFormat i...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16517 No concern after the latest changes. LGTM pending Jenkins --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16621 No more comments. It looks pretty good! Let us see whether all the test cases can pass. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16621#discussion_r96575899 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -1799,6 +1799,7 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach { .getTableMetadata(TableIdentifier("tbl")).storage.locationUri.get sql(s"ALTER TABLE tbl SET LOCATION '${dir.getCanonicalPath}'") +spark.catalog.refreshTable("tbl") --- End diff -- +1 :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16621#discussion_r96575520 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -586,12 +594,12 @@ class SessionCatalog( desc = metadata, output = metadata.schema.toAttributes, child = parser.parsePlan(viewText)) - SubqueryAlias(relationAlias, child, Option(name)) + SubqueryAlias(relationAlias, child, Some(name.copy(table = table, database = Some(db } else { SubqueryAlias(relationAlias, SimpleCatalogRelation(metadata), None) } } else { -SubqueryAlias(relationAlias, tempTables(table), Option(name)) +SubqueryAlias(relationAlias, tempTables(table), None) --- End diff -- Should we keep the existing way? This was introduced for the EXPLAIN command of view. See the PR: https://github.com/apache/spark/pull/14657 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16621 **[Test build #71574 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71574/testReport)** for PR 16621 at commit [`bbccdae`](https://github.com/apache/spark/commit/bbccdae6640a5efe047dba2df569384a78bd986e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHolder for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16585 **[Test build #71575 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71575/testReport)** for PR 16585 at commit [`2b61d47`](https://github.com/apache/spark/commit/2b61d472a74766d4a1c2af4cf2278a87b7b12698). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHol...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16585#discussion_r96574565 --- Diff: python/pyspark/sql/tests.py --- @@ -435,6 +435,31 @@ def test_udf_with_input_file_name(self): row = self.spark.read.json(filePath).select(sourceFile(input_file_name())).first() self.assertTrue(row[0].find("people1.json") != -1) +def test_udf_with_input_file_name_for_hadooprdd(self): +from pyspark.sql.functions import udf, input_file_name +from pyspark.sql.types import StringType + +def filename(path): +return path + +self.spark.udf.register('sameText', filename) --- End diff -- oh. wrongly copied. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHol...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16585#discussion_r96574569 --- Diff: python/pyspark/sql/tests.py --- @@ -435,6 +435,31 @@ def test_udf_with_input_file_name(self): row = self.spark.read.json(filePath).select(sourceFile(input_file_name())).first() self.assertTrue(row[0].find("people1.json") != -1) +def test_udf_with_input_file_name_for_hadooprdd(self): +from pyspark.sql.functions import udf, input_file_name +from pyspark.sql.types import StringType + +def filename(path): +return path + +self.spark.udf.register('sameText', filename) +sameText = udf(filename, StringType()) + +rdd = self.sc.textFile('python/test_support/sql/people.json') +df = self.spark.read.json(rdd).select(input_file_name().alias('file')) +row = df.select(sameText(df['file'])).first() +self.assertTrue(row[0].find("people.json") != -1) + +rdd2 = self.sc.newAPIHadoopFile( +'python/test_support/sql/people.json', +'org.apache.hadoop.mapreduce.lib.input.TextInputFormat', +'org.apache.hadoop.io.LongWritable', +'org.apache.hadoop.io.Text') + +df2 = self.spark.read.json(rdd2).select(input_file_name().alias('file')) +row = df2.select(sameText(df2['file'])).first() --- End diff -- sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12064 **[Test build #71573 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71573/testReport)** for PR 12064 at commit [`eebae43`](https://github.com/apache/spark/commit/eebae43c84a1179260648f7f5cdbb63a60fcc40d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16621 **[Test build #71572 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71572/testReport)** for PR 16621 at commit [`2883c8b`](https://github.com/apache/spark/commit/2883c8bc6c22bfde24711060b5110b896f4c8b4a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16603: [SPARK-19244][Core] Sort MemoryConsumers accordin...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16603#discussion_r96574267 --- Diff: core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java --- @@ -144,23 +170,31 @@ public long acquireExecutionMemory(long required, MemoryConsumer consumer) { // spilling, avoid to have too many spilled files. if (got < required) { // Call spill() on other consumers to release memory +// Sort the consumers according their memory usage. So we avoid spilling the same consumer +// which is just spilled in last few times and re-spilling on it will produce many small +// spill files. +List sortedList = new ArrayList<>(); for (MemoryConsumer c: consumers) { if (c != consumer && c.getUsed() > 0 && c.getMode() == mode) { -try { - long released = c.spill(required - got, consumer); - if (released > 0) { -logger.debug("Task {} released {} from {} for {}", taskAttemptId, - Utils.bytesToString(released), c, consumer); -got += memoryManager.acquireExecutionMemory(required - got, taskAttemptId, mode); -if (got >= required) { - break; -} +sortedList.add(c); + } +} +Collections.sort(sortedList, new ConsumerComparator()); +for (MemoryConsumer c: sortedList) { + try { +long released = c.spill(required - got, consumer); +if (released > 0) { + logger.debug("Task {} released {} from {} for {}", taskAttemptId, +Utils.bytesToString(released), c, consumer); + got += memoryManager.acquireExecutionMemory(required - got, taskAttemptId, mode); + if (got >= required) { +break; } -} catch (IOException e) { - logger.error("error while calling spill() on " + c, e); - throw new OutOfMemoryError("error while calling spill() on " + c + " : " -+ e.getMessage()); } + } catch (IOException e) { +logger.error("error while calling spill() on " + c, e); +throw new OutOfMemoryError("error while calling spill() on " + c + " : " + + e.getMessage()); } --- End diff -- Actually the newest update is already satisfying the example you show. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16576: [SPARK-19215] Add necessary check for `RDD.checkp...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/16576#discussion_r96573963 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1539,6 +1539,9 @@ abstract class RDD[T: ClassTag]( // NOTE: we use a global lock here due to complexities downstream with ensuring // children RDD partitions point to the correct parent partitions. In the future // we should revisit this consideration. +if (doCheckpointCalled) { + logWarning(s"Because job has been executed on RDD ${id}, checkpoint won't work") --- End diff -- reping @zsxwing Would you mind take a look? This is a simple PR but it will bring much help for spark developers to avoid them making some mistake usage... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16621 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71567/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16603: [SPARK-19244][Core] Sort MemoryConsumers accordin...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16603#discussion_r96572902 --- Diff: core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java --- @@ -144,23 +170,31 @@ public long acquireExecutionMemory(long required, MemoryConsumer consumer) { // spilling, avoid to have too many spilled files. if (got < required) { // Call spill() on other consumers to release memory +// Sort the consumers according their memory usage. So we avoid spilling the same consumer +// which is just spilled in last few times and re-spilling on it will produce many small +// spill files. +List sortedList = new ArrayList<>(); for (MemoryConsumer c: consumers) { if (c != consumer && c.getUsed() > 0 && c.getMode() == mode) { -try { - long released = c.spill(required - got, consumer); - if (released > 0) { -logger.debug("Task {} released {} from {} for {}", taskAttemptId, - Utils.bytesToString(released), c, consumer); -got += memoryManager.acquireExecutionMemory(required - got, taskAttemptId, mode); -if (got >= required) { - break; -} +sortedList.add(c); + } +} +Collections.sort(sortedList, new ConsumerComparator()); +for (MemoryConsumer c: sortedList) { + try { +long released = c.spill(required - got, consumer); +if (released > 0) { + logger.debug("Task {} released {} from {} for {}", taskAttemptId, +Utils.bytesToString(released), c, consumer); + got += memoryManager.acquireExecutionMemory(required - got, taskAttemptId, mode); + if (got >= required) { +break; } -} catch (IOException e) { - logger.error("error while calling spill() on " + c, e); - throw new OutOfMemoryError("error while calling spill() on " + c + " : " -+ e.getMessage()); } + } catch (IOException e) { +logger.error("error while calling spill() on " + c, e); +throw new OutOfMemoryError("error while calling spill() on " + c + " : " + + e.getMessage()); } --- End diff -- Use ceiling and not floor. Ensure that the requirements are satisfied : what I wrote was on the fly to convey the idea and not meant to be used literally - and apparently there were some errors. I have edited the examples so that there is no further confusion. Basic idea is simple : instead of picking largest or random consumer, pick one which is sufficient to meet the memory requirements. If none exist, then evict the largest and retry until done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16621 Sure. No problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16621 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16621 can we do it later? We are going to merge `CatalogRelation` implementations and unify the table relation representations soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16621 **[Test build #71567 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71567/testReport)** for PR 16621 at commit [`919aaa2`](https://github.com/apache/spark/commit/919aaa2fbdf21fb4760a855c538e4bc9efa25d4b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class QualifiedTableName(database: String, name: String)` * `class FindHiveSerdeTable(session: SparkSession) extends Rule[LogicalPlan] ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHol...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16585#discussion_r96572553 --- Diff: python/pyspark/sql/tests.py --- @@ -435,6 +435,31 @@ def test_udf_with_input_file_name(self): row = self.spark.read.json(filePath).select(sourceFile(input_file_name())).first() self.assertTrue(row[0].find("people1.json") != -1) +def test_udf_with_input_file_name_for_hadooprdd(self): +from pyspark.sql.functions import udf, input_file_name +from pyspark.sql.types import StringType + +def filename(path): +return path + +self.spark.udf.register('sameText', filename) +sameText = udf(filename, StringType()) + +rdd = self.sc.textFile('python/test_support/sql/people.json') +df = self.spark.read.json(rdd).select(input_file_name().alias('file')) +row = df.select(sameText(df['file'])).first() +self.assertTrue(row[0].find("people.json") != -1) + +rdd2 = self.sc.newAPIHadoopFile( +'python/test_support/sql/people.json', +'org.apache.hadoop.mapreduce.lib.input.TextInputFormat', +'org.apache.hadoop.io.LongWritable', +'org.apache.hadoop.io.Text') + +df2 = self.spark.read.json(rdd2).select(input_file_name().alias('file')) +row = df2.select(sameText(df2['file'])).first() --- End diff -- nit: `row2`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHol...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16585#discussion_r96572514 --- Diff: python/pyspark/sql/tests.py --- @@ -435,6 +435,31 @@ def test_udf_with_input_file_name(self): row = self.spark.read.json(filePath).select(sourceFile(input_file_name())).first() self.assertTrue(row[0].find("people1.json") != -1) +def test_udf_with_input_file_name_for_hadooprdd(self): +from pyspark.sql.functions import udf, input_file_name +from pyspark.sql.types import StringType + +def filename(path): +return path + +self.spark.udf.register('sameText', filename) --- End diff -- where do we call this registered function? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16621 Could we rename `SimpleCatalogRelation` to `UnresolvedCatalogRelation`? The current name looks very confusing to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16621#discussion_r96572359 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -215,37 +215,44 @@ case class DataSourceAnalysis(conf: CatalystConf) extends Rule[LogicalPlan] { /** - * Replaces [[SimpleCatalogRelation]] with data source table if its table property contains data - * source information. + * Replaces [[SimpleCatalogRelation]] with data source table if its table provider is not hive. */ class FindDataSourceTable(sparkSession: SparkSession) extends Rule[LogicalPlan] { - private def readDataSourceTable( - sparkSession: SparkSession, - simpleCatalogRelation: SimpleCatalogRelation): LogicalPlan = { -val table = simpleCatalogRelation.catalogTable -val pathOption = table.storage.locationUri.map("path" -> _) -val dataSource = - DataSource( -sparkSession, -userSpecifiedSchema = Some(table.schema), -partitionColumns = table.partitionColumnNames, -bucketSpec = table.bucketSpec, -className = table.provider.get, -options = table.storage.properties ++ pathOption) - -LogicalRelation( - dataSource.resolveRelation(), - expectedOutputAttributes = Some(simpleCatalogRelation.output), - catalogTable = Some(table)) + private def readDataSourceTable(relation: SimpleCatalogRelation): LogicalPlan = { +val table = relation.catalogTable +val cache = sparkSession.sessionState.catalog.tableRelationCache +val withHiveSupport = + sparkSession.sparkContext.conf.get(StaticSQLConf.CATALOG_IMPLEMENTATION) == "hive" + +cache.get(table.qualifiedIdentifier, new Callable[LogicalPlan]() { + override def call(): LogicalPlan = { +val pathOption = table.storage.locationUri.map("path" -> _) +val dataSource = + DataSource( +sparkSession, +userSpecifiedSchema = Some(table.schema), --- End diff -- good catch! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71564/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13599 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13599 **[Test build #71564 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71564/testReport)** for PR 13599 at commit [`ea9e0c4`](https://github.com/apache/spark/commit/ea9e0c4e80ea568c066156e76cc8abefb911fb59). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16624: [WIP] Add two test cases for `SET -v`.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16624 **[Test build #71571 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71571/testReport)** for PR 16624 at commit [`87de8da`](https://github.com/apache/spark/commit/87de8da846a7b4d368c1475ba3fc3d83cc865220). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12064 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71566/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12064 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16621#discussion_r96571473 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -650,14 +659,21 @@ class SessionCatalog( * Refresh the cache entry for a metastore table, if any. */ def refreshTable(name: TableIdentifier): Unit = synchronized { +val dbName = formatDatabaseName(name.database.getOrElse(currentDb)) +val tableName = formatTableName(name.table) + // Go through temporary tables and invalidate them. -// If the database is defined, this is definitely not a temp table. +// If the database is defined, this may be a global temporary view. // If the database is not defined, there is a good chance this is a temp table. if (name.database.isEmpty) { - tempTables.get(formatTableName(name.table)).foreach(_.refresh()) -} else if (formatDatabaseName(name.database.get) == globalTempViewManager.database) { - globalTempViewManager.get(formatTableName(name.table)).foreach(_.refresh()) + tempTables.get(tableName).foreach(_.refresh()) +} else if (dbName == globalTempViewManager.database) { + globalTempViewManager.get(tableName).foreach(_.refresh()) } + +// Also invalidate the table relation cache. --- End diff -- After an offline discussion, I am fine to remove it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12064 **[Test build #71566 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71566/testReport)** for PR 12064 at commit [`fd85c5d`](https://github.com/apache/spark/commit/fd85c5d221a0cc52c8b5f4662182d487e34db63b). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16547: [SPARK-19168][Structured Streaming] StateStore should be...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16547 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71565/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16547: [SPARK-19168][Structured Streaming] StateStore should be...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16547 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHolder for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16585 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71561/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16547: [SPARK-19168][Structured Streaming] StateStore should be...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16547 **[Test build #71565 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71565/testReport)** for PR 16547 at commit [`0f9e54d`](https://github.com/apache/spark/commit/0f9e54d9efe4c9d7f446cb2f4dc46741cef776f7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHolder for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16585 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHolder for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16585 **[Test build #71561 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71561/testReport)** for PR 16585 at commit [`2ce65cb`](https://github.com/apache/spark/commit/2ce65cb8336b32d8309f189e2c63a576c5a60ee5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16609: [SPARK-8480] [CORE] [PYSPARK] [SPARKR] Add setName for D...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16609 @emlyn I'm not sure this should be associated with RDD, since we are working with DataFrame here? As for the existing `name` methods in RDD.R - they are not public APIs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16621#discussion_r96570432 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -215,37 +215,44 @@ case class DataSourceAnalysis(conf: CatalystConf) extends Rule[LogicalPlan] { /** - * Replaces [[SimpleCatalogRelation]] with data source table if its table property contains data - * source information. + * Replaces [[SimpleCatalogRelation]] with data source table if its table provider is not hive. */ class FindDataSourceTable(sparkSession: SparkSession) extends Rule[LogicalPlan] { - private def readDataSourceTable( - sparkSession: SparkSession, - simpleCatalogRelation: SimpleCatalogRelation): LogicalPlan = { -val table = simpleCatalogRelation.catalogTable -val pathOption = table.storage.locationUri.map("path" -> _) -val dataSource = - DataSource( -sparkSession, -userSpecifiedSchema = Some(table.schema), -partitionColumns = table.partitionColumnNames, -bucketSpec = table.bucketSpec, -className = table.provider.get, -options = table.storage.properties ++ pathOption) - -LogicalRelation( - dataSource.resolveRelation(), - expectedOutputAttributes = Some(simpleCatalogRelation.output), - catalogTable = Some(table)) + private def readDataSourceTable(relation: SimpleCatalogRelation): LogicalPlan = { +val table = relation.catalogTable +val cache = sparkSession.sessionState.catalog.tableRelationCache +val withHiveSupport = + sparkSession.sparkContext.conf.get(StaticSQLConf.CATALOG_IMPLEMENTATION) == "hive" + +cache.get(table.qualifiedIdentifier, new Callable[LogicalPlan]() { + override def call(): LogicalPlan = { +val pathOption = table.storage.locationUri.map("path" -> _) +val dataSource = + DataSource( +sparkSession, +userSpecifiedSchema = Some(table.schema), --- End diff -- ``` // In older version(prior to 2.1) of Spark, the table schema can be empty and should be // inferred at runtime. We should still support it. ``` Is it still valid? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16606#discussion_r96569991 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/PartitionedWriteSuite.scala --- @@ -92,6 +92,30 @@ class PartitionedWriteSuite extends QueryTest with SharedSQLContext { } } + + test("saveAsTable with inconsistent columns order" + --- End diff -- does this test improve the test coverage? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16606#discussion_r96569809 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -1374,4 +1377,47 @@ class HiveDDLSuite assert(e2.message.contains("Hive data source can only be used with tables")) } } + + test("table partition schema should be ordered") { +withTable("t", "t1") { + val path = Utils.createTempDir(namePrefix = "t") + val path1 = Utils.createTempDir(namePrefix = "t1") + try { +spark.sql(s""" + |create table t (id long, P1 int, P2 int) + |using parquet + |options (path "$path") + |partitioned by (P1, P2)""".stripMargin) --- End diff -- this test can pass without your changes right? I think we can just keep the below one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16623: [SPARK-19066][SPARKR][Backport-2.1]:LDA doesn't set opti...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16623 merged to 2.1. thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16606#discussion_r96569387 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -1374,4 +1377,47 @@ class HiveDDLSuite assert(e2.message.contains("Hive data source can only be used with tables")) } } + + test("table partition schema should be ordered") { +withTable("t", "t1") { + val path = Utils.createTempDir(namePrefix = "t") + val path1 = Utils.createTempDir(namePrefix = "t1") + try { +spark.sql(s""" --- End diff -- nit: code style, please follow existing code: https://github.com/apache/spark/pull/16606/files#diff-b7094baa12601424a5d19cb930e3402fR1255 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16606#discussion_r96569222 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -1374,4 +1377,47 @@ class HiveDDLSuite assert(e2.message.contains("Hive data source can only be used with tables")) } } + + test("table partition schema should be ordered") { +withTable("t", "t1") { + val path = Utils.createTempDir(namePrefix = "t") --- End diff -- use `withTempDir` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16606#discussion_r96569205 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -1374,4 +1377,47 @@ class HiveDDLSuite assert(e2.message.contains("Hive data source can only be used with tables")) } } + + test("table partition schema should be ordered") { --- End diff -- table partition schema should respect the order of partition columns --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16623: [SPARK-19066][SPARKR][Backport-2.1]:LDA doesn't set opti...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16623 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16606#discussion_r96569171 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala --- @@ -138,6 +138,7 @@ case class CreateDataSourceTableAsSelectCommand( val tableIdentWithDB = table.identifier.copy(database = Some(db)) val tableName = tableIdentWithDB.unquotedString +var tableWithSchema = table.copy(schema = query.output.toStructType) --- End diff -- shall we set the schema in `AnalyzeCreateTable`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16589: [SPARK-19231][SPARKR] add error handling for down...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16589#discussion_r96569089 --- Diff: R/pkg/R/install.R --- @@ -201,14 +221,20 @@ directDownloadTar <- function(mirrorUrl, version, hadoopVersion, packageName, pa msg <- sprintf(fmt, version, ifelse(hadoopVersion == "without", "Free build", hadoopVersion), packageRemotePath) message(msg) - downloadUrl(packageRemotePath, packageLocalPath, paste0("Fetch failed from ", mirrorUrl)) + downloadUrl(packageRemotePath, packageLocalPath) --- End diff -- yea I agree. I guess I'm trying to bubble up error messages to the top level but generally the exception to throw is making this non-trivial (never thought I'd say that!) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16503 @vanzin Sorry for the stupid mistake I made. I've changed. Please take another look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16589: [SPARK-19231][SPARKR] add error handling for down...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/16589#discussion_r96567761 --- Diff: R/pkg/R/install.R --- @@ -201,14 +221,20 @@ directDownloadTar <- function(mirrorUrl, version, hadoopVersion, packageName, pa msg <- sprintf(fmt, version, ifelse(hadoopVersion == "without", "Free build", hadoopVersion), packageRemotePath) message(msg) - downloadUrl(packageRemotePath, packageLocalPath, paste0("Fetch failed from ", mirrorUrl)) + downloadUrl(packageRemotePath, packageLocalPath) --- End diff -- I didn't relate this to the update in L176 - I think this is fine. In general I think this file has gotten a little unwieldy with error messages coming from different functions. I wonder if there is a better way to refactor things to setup some expectations on where errors are thrown etc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16589: [SPARK-19231][SPARKR] add error handling for down...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/16589#discussion_r96289817 --- Diff: R/pkg/R/install.R --- @@ -54,7 +54,7 @@ #' } #' @param overwrite If \code{TRUE}, download and overwrite the existing tar file in localDir #' and force re-install Spark (in case the local directory or file is corrupted) -#' @return \code{install.spark} returns the local directory where Spark is found or installed +#' @return the (invisible) local directory where Spark is found or installed --- End diff -- Got it. Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16503 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16503 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16503 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71560/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16503 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71558/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16503 **[Test build #71558 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71558/testReport)** for PR 16503 at commit [`52af8c5`](https://github.com/apache/spark/commit/52af8c5359a48e31f665f282c0a50aaacb19ae4d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16503 **[Test build #71560 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71560/testReport)** for PR 16503 at commit [`69b412a`](https://github.com/apache/spark/commit/69b412ac9fd6d6ebd27049cdbaf7a2c5ef75455b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14204 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71557/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14204: [SPARK-16520] [WEBUI] Link executors to correspon...
Github user nblintao commented on a diff in the pull request: https://github.com/apache/spark/pull/14204#discussion_r96568190 --- Diff: core/src/main/resources/org/apache/spark/ui/static/executorspage.js --- @@ -408,12 +420,17 @@ $(document).ready(function () { data: 'id', render: function (data, type) { return type === 'display' ? ("Thread Dump" ) : data; } -} +}, +{data: 'worker', render: formatWorkersCells} ], "columnDefs": [ { "targets": [ 16 ], "visible": getThreadDumpEnabled() +}, +{ +"targets": [ 17 ], +"visible": workersExist(response) --- End diff -- Fixed. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15125 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71556/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15125 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14204 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15125 **[Test build #71556 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71556/testReport)** for PR 15125 at commit [`e786838`](https://github.com/apache/spark/commit/e786838af3912953d61787210213c269b4a5cdba). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14204 **[Test build #71557 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71557/testReport)** for PR 14204 at commit [`d23643c`](https://github.com/apache/spark/commit/d23643ce79efe98e33e42c23548478d672f5de81). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org