[GitHub] spark issue #18430: [SPARK-21223] Change fileToAppInfo in FsHistoryProvider ...
Github user zenglinxi0615 commented on the issue: https://github.com/apache/spark/pull/18430 test please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18430: [SPARK-21223]:Thread-safety issue in FsHistoryProvider
Github user zenglinxi0615 commented on the issue: https://github.com/apache/spark/pull/18430 sorry, it's a typing error, i mean the related JIRA: SPARK-21078. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18430: [SPARK-21223]:Thread-safety issue in FsHistoryProvider
Github user zenglinxi0615 commented on the issue: https://github.com/apache/spark/pull/18430 @srowen thanks for your suggestions again! and should I address the problem of SPARK-13988 in this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18430: [SPARK-21223]:Thread-safety issue in FsHistoryProvider
Github user zenglinxi0615 commented on the issue: https://github.com/apache/spark/pull/18430 @jerryshao actually, this threading issue cause an infinite loop when we restart historyserver and replaying event logs of spark apps. you can see the jstack log in attachments of SPARK-21223. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18430: [SPARK-21223]:Thread-safety issue in FsHistoryProvider
Github user zenglinxi0615 commented on the issue: https://github.com/apache/spark/pull/18430 @jerryshao do you mean that after fileToAppInfo.get(entry.getPath()) return a value, other threads may add or change the value of entry.getPath(), which cause an inconsistent issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18430: [SPARK-21223]:Thread-safety issue in FsHistoryPro...
Github user zenglinxi0615 commented on a diff in the pull request: https://github.com/apache/spark/pull/18430#discussion_r124271386 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -321,7 +322,7 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) // scan for modified applications, replay and merge them val logInfos: Seq[FileStatus] = statusList .filter { entry => - val prevFileSize = fileToAppInfo.get(entry.getPath()).map{_.fileSize}.getOrElse(0L) + val prevFileSize = fileToAppInfo.asScala.get(entry.getPath()).map{_.fileSize}.getOrElse(0L) --- End diff -- @srowen thanks for your suggestions, I have made some modifications, could you take a look when you have time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18430: [SPARK-21223]:Thread-safety issue in FsHistoryPro...
GitHub user zenglinxi0615 opened a pull request: https://github.com/apache/spark/pull/18430 [SPARK-21223]:Thread-safety issue in FsHistoryProvider ## What changes were proposed in this pull request? fix the Thread-safety issue in FsHistoryProvider Currently, Spark HistoryServer use a HashMap named fileToAppInfo in class FsHistoryProvider to store the map of eventlog path and attemptInfo. When use ThreadPool to Replay the log files in the list and merge the list of old applications with new ones, multi thread may update fileToAppInfo at the same time, which may cause Thread-safety issues. (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zenglinxi0615/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18430.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18430 commit d2b3c960012403fcc9be6fbd33f74f395d879f9d Author: æ¾æ西 Date: 2017-06-27T07:29:44Z [SPARK-21223]:Thread-safety issue in FsHistoryProvider --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14085: [SPARK-16408][SQL] SparkSQL Added file get Except...
Github user zenglinxi0615 commented on a diff in the pull request: https://github.com/apache/spark/pull/14085#discussion_r122620464 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala --- @@ -113,8 +113,9 @@ case class AddFile(path: String) extends RunnableCommand { override def run(sqlContext: SQLContext): Seq[Row] = { val hiveContext = sqlContext.asInstanceOf[HiveContext] +val recursive = sqlContext.sparkContext.getConf.getBoolean("spark.input.dir.recursive", false) --- End diff -- I was wondering if we could call: sparkSession.sparkContext.addFile(path, true) in AddFileCommand func, since it's a general demand in ETL. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17550: [SPARK-20240][SQL] SparkSQL support limitations o...
Github user zenglinxi0615 closed the pull request at: https://github.com/apache/spark/pull/17550 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17550: [SPARK-20240][SQL] SparkSQL support limitations of max d...
Github user zenglinxi0615 commented on the issue: https://github.com/apache/spark/pull/17550 okï¼going to close this PR and open a new PR using the master branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17550: [SPARK-20240][SQL] SparkSQL support limitations o...
GitHub user zenglinxi0615 opened a pull request: https://github.com/apache/spark/pull/17550 [SPARK-20240][SQL] SparkSQL support limitations of max dynamic partit⦠â¦ions when inserting hive table ## What changes were proposed in this pull request? support limitations of max dynamic partitions when inserting hive table by using hive.exec.max.dynamic.partitions.pernode ## How was this patch tested? manual test You can merge this pull request into a Git repository by running: $ git pull https://github.com/zenglinxi0615/spark SPARK-20240 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17550.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17550 commit adc91b958ec7aeeab0eaf1663de15ebbcb83da0d Author: æ¾æ西 Date: 2017-04-06T11:00:47Z [SPARK-20240][SQL] SparkSQL support limitations of max dynamic partitions when inserting hive table --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14686: [SPARK-16253][SQL] make spark sql compatible with hive s...
Github user zenglinxi0615 commented on the issue: https://github.com/apache/spark/pull/14686 sorry for long time no response. yes, you are right, when you can change the sql from using '/temp/test.py' to using 'python /temp/test.py', there's no need for changing the spark source code. However, this patch is work for the case when there are already many hive sql which using '/temp/test.py', it cost too much time for modifing these hive sql, so we want to spark sql compatible with hive sql that using python script transform like using 'xxx.py'. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14686: [SPARK-16253][SQL] make spark sql compatible with hive s...
Github user zenglinxi0615 commented on the issue: https://github.com/apache/spark/pull/14686 Have you tried it on spark 1.6.2? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14686: [SPARK-16253][SQL] make spark sql compatible with...
GitHub user zenglinxi0615 opened a pull request: https://github.com/apache/spark/pull/14686 [SPARK-16253][SQL] make spark sql compatible with hive sql that using⦠## What changes were proposed in this pull request? make spark sql compatible with hive sql that using python script transform like using 'xxx.py' ## How was this patch tested? manual tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zenglinxi0615/spark v1.6.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14686.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14686 commit 29df40dab9963e0dbce4119bdd872a86ff670af9 Author: æ¾æ西 Date: 2016-06-28T12:37:05Z [SPARK-16253][SQL] make spark sql compatible with hive sql that using python script transform like using 'xxx.py' --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14085: [SPARK-16408][SQL] SparkSQL Added file get Except...
Github user zenglinxi0615 commented on a diff in the pull request: https://github.com/apache/spark/pull/14085#discussion_r69865365 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala --- @@ -113,8 +113,9 @@ case class AddFile(path: String) extends RunnableCommand { override def run(sqlContext: SQLContext): Seq[Row] = { val hiveContext = sqlContext.asInstanceOf[HiveContext] +val recursive = sqlContext.sparkContext.getConf.getBoolean("spark.input.dir.recursive", false) --- End diff -- And by the way, I have tried: val recursive = hiveContext.getConf("spark.input.dir.recursive", "false") but this can only work in spark sql by execute set spark.input.dir.recursive=true before add file, and we can't set the value by --conf spark.input.dir.recursive=true. This makes it difficult for us to move some hive sql directly to SparkSQL. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14085: [SPARK-16408][SQL] SparkSQL Added file get Except...
Github user zenglinxi0615 commented on a diff in the pull request: https://github.com/apache/spark/pull/14085#discussion_r69864435 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala --- @@ -113,8 +113,9 @@ case class AddFile(path: String) extends RunnableCommand { override def run(sqlContext: SQLContext): Seq[Row] = { val hiveContext = sqlContext.asInstanceOf[HiveContext] +val recursive = sqlContext.sparkContext.getConf.getBoolean("spark.input.dir.recursive", false) --- End diff -- I'm pretty sure that it's supported by the SQL dialect in Spark SQL. And about "the name of this property is too generic, and I don't think it is something that is set globally", do you think we should use another name? and the default value should be true? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14085: [SPARK-16408][SQL] SparkSQL Added file get Except...
GitHub user zenglinxi0615 opened a pull request: https://github.com/apache/spark/pull/14085 [SPARK-16408][SQL] SparkSQL Added file get Exception: is a directory ⦠## What changes were proposed in this pull request? This PR is for adding an parameter (spark.input.dir.recursive) to control the value of recursive in SparkContext#addFile, so we can support "add file hdfs://dir/path" cmd in SparkSQL ## How was this patch tested? manual tests: set the conf: --conf spark.input.dir.recursive=true, and run spark-sql -e "add file hdfs://dir/path" You can merge this pull request into a Git repository by running: $ git pull https://github.com/zenglinxi0615/spark SPARK-16408 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14085.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14085 commit d2e05c155e4e52dfda177a21615de7743a2c5917 Author: æ¾æ西 Date: 2016-07-07T06:20:19Z [SPARK-16408][SQL] SparkSQL Added file get Exception: is a directory and recursive is not turned on --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org