[GitHub] [incubator-livy] mgaido91 commented on issue #238: [LIVY-689] Deliver stage process message to the end user using thriftserver

2019-12-02 Thread GitBox
mgaido91 commented on issue #238: [LIVY-689] Deliver stage process message to 
the end user using thriftserver
URL: https://github.com/apache/incubator-livy/pull/238#issuecomment-561041878
 
 
   @huianyi 
   
   > Like I previous said, I can support this in a new PR.
   
   That sound great, I think you can go ahead, I am happy to review it.
   
   > ProcessBar is great in real-time interaction environment, but it can not 
work well if we do scheduling jobs, which we need print the job's status in 
seperate lines.
   
   I don't see the point here. Moreover I think for monitoring such job other 
method would be more appropriate, like querying the REST API for the status of 
the job.
   
   My position is to go ahead with the progress bar, but I am not very 
confident about this feature. Otherwise also other people think that this is 
useful, I'd go only with the progress bar feature.
   
   Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-livy] huianyi edited a comment on issue #238: [LIVY-689] Deliver stage process message to the end user using thriftserver

2019-12-02 Thread GitBox
huianyi edited a comment on issue #238: [LIVY-689] Deliver stage process 
message to the end user using thriftserver
URL: https://github.com/apache/incubator-livy/pull/238#issuecomment-560985424
 
 
   @mgaido91 The previous link you pointed actually is another feature we can 
implement(process bar), you can check it here, 
[HiveStatement.java#L381](https://github.com/apache/hive/blob/209096a71b25a3d75dca1930f825c8c10b99d2b9/jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java#L381)
 , 
[BeelineInPlaceUpdateStream.java#L39](https://github.com/apache/hive/blob/209096a71b25a3d75dca1930f825c8c10b99d2b9/beeline/src/java/org/apache/hive/beeline/logs/BeelineInPlaceUpdateStream.java#L39)
   
   Like I previous said, I can support this in a new PR.
   
   > > > Beeline from Hive 2.2 has in-place progress bar 
hive.server2.in.place.progress
   > > > 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.server2.in.place.progress
   > > > (I think it works only with tez - would be nice if this would work 
with Livy too )
   > > 
   > > 
   > > Would be better to support this if it is not a big effort.
   > 
   > After reading hive related code, this needs write some new classes to 
implement this feature, which might cause this PR large, if you don not mind, 
I'd love to open a new PR to support this.
   
   ProcessBar is great in real-time interaction environment, but it can not 
work well if we do scheduling jobs, which we need print the job's status in 
seperate lines.
   
   > In this way, we need to store nothing on Livy server memory.
   
   By default setting, we can keep each session keep not more than 100 rows, 
actually under normal circumstances, client will get these message immediately, 
this might not put much memory pressure on Livy server side. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-livy] jerryshao commented on a change in pull request #242: [LIVY-336][SERVER] Livy should not spawn one thread per job to track the job on Yarn

2019-12-02 Thread GitBox
jerryshao commented on a change in pull request #242: [LIVY-336][SERVER] Livy 
should not spawn one thread per job to track the job on Yarn
URL: https://github.com/apache/incubator-livy/pull/242#discussion_r352975680
 
 

 ##
 File path: server/src/main/scala/org/apache/livy/utils/SparkYarnApp.scala
 ##
 @@ -56,20 +69,55 @@ object SparkYarnApp extends Logging {
 c
   }
 
-  private def getYarnTagToAppIdTimeout(livyConf: LivyConf): FiniteDuration =
-livyConf.getTimeAsMs(LivyConf.YARN_APP_LOOKUP_TIMEOUT) milliseconds
-
-  private def getYarnPollInterval(livyConf: LivyConf): FiniteDuration =
-livyConf.getTimeAsMs(LivyConf.YARN_POLL_INTERVAL) milliseconds
-
   private[utils] val appType = Set("SPARK").asJava
 
   private[utils] val leakedAppTags = new 
java.util.concurrent.ConcurrentHashMap[String, Long]()
 
+  private val monitorAppThreadMap = new 
java.util.concurrent.ConcurrentHashMap[Thread, Long]()
+
+  private val appQueue = new ConcurrentLinkedQueue[SparkYarnApp]()
+
   private var sessionLeakageCheckTimeout: Long = _
 
   private var sessionLeakageCheckInterval: Long = _
 
+  private var yarnAppMonitorThreadInterval: Long = _
+
+  private var yarnAppMonitorThreadBlockCheckInterval: Long = _
+
+  private var yarnAppMonitorThreadBlockTimeout: Long = _
+
+  private var yarnAppMonitorMaxFailedTimes: Long = _
+
+  private var yarnTagToAppIdMaxFailedTimes: Long = _
+
+  private val checkMonitorAppTimeoutThread = new Thread() {
+override def run(): Unit = {
+  while (true) {
+try {
+  val iter = monitorAppThreadMap.entrySet().iterator()
+  val now = System.currentTimeMillis()
+
+  while (iter.hasNext) {
+val entry = iter.next()
+val thread = entry.getKey
+val updatedTime = entry.getValue
+
+if (now - updatedTime - yarnAppMonitorThreadInterval >
+  yarnAppMonitorThreadBlockTimeout) {
+  thread.interrupt()
+}
+  }
+
+  Thread.sleep(yarnAppMonitorThreadBlockCheckInterval)
+} catch {
+  case e: InterruptedException =>
+error(s"checkMonitorAppTimeoutThread Exception whiling monitor", e)
 
 Review comment:
   How do you exit the thread if you catch the `InterruptedException` in the 
infinite loop?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-livy] huianyi edited a comment on issue #238: [LIVY-689] Deliver stage process message to the end user using thriftserver

2019-12-02 Thread GitBox
huianyi edited a comment on issue #238: [LIVY-689] Deliver stage process 
message to the end user using thriftserver
URL: https://github.com/apache/incubator-livy/pull/238#issuecomment-560985424
 
 
   @mgaido91 The previous link you pointed actually is another feature we can 
implement(process bar), you can check it here, 
[HiveStatement.java#L381](https://github.com/apache/hive/blob/209096a71b25a3d75dca1930f825c8c10b99d2b9/jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java#L381)
 , 
[BeelineInPlaceUpdateStream.java#L39](https://github.com/apache/hive/blob/209096a71b25a3d75dca1930f825c8c10b99d2b9/beeline/src/java/org/apache/hive/beeline/logs/BeelineInPlaceUpdateStream.java#L39)
   
   Like I previous said, I can support this in a new PR.
   
   > > > Beeline from Hive 2.2 has in-place progress bar 
hive.server2.in.place.progress
   > > > 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.server2.in.place.progress
   > > > (I think it works only with tez - would be nice if this would work 
with Livy too )
   > > 
   > > 
   > > Would be better to support this if it is not a big effort.
   > 
   > After reading hive related code, this needs write some new classes to 
implement this feature, which might cause this PR large, if you don not mind, 
I'd love to open a new PR to support this.
   
   ProcessBar is great in real-time interaction environment, but it can not 
work well if we do scheduling jobs, which we need print the job's status by 
individual lines.
   
   > In this way, we need to store nothing on Livy server memory.
   
   By default setting, we can keep each session keep not more than 100 rows, 
actually under normal circumstances, client will get these message immediately, 
this might not put much memory pressure on Livy server side. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-livy] jerryshao commented on a change in pull request #242: [LIVY-336][SERVER] Livy should not spawn one thread per job to track the job on Yarn

2019-12-02 Thread GitBox
jerryshao commented on a change in pull request #242: [LIVY-336][SERVER] Livy 
should not spawn one thread per job to track the job on Yarn
URL: https://github.com/apache/incubator-livy/pull/242#discussion_r352973535
 
 

 ##
 File path: server/src/main/scala/org/apache/livy/LivyConf.scala
 ##
 @@ -216,11 +213,27 @@ object LivyConf {
   // RSC related jars separated with comma.
   val RSC_JARS = Entry("livy.rsc.jars", null)
 
-  // How long to check livy session leakage
+  // How long to check livy session leakage.
   val YARN_APP_LEAKAGE_CHECK_TIMEOUT = 
Entry("livy.server.yarn.app-leakage.check-timeout", "600s")
-  // how often to check livy session leakage
+  // How often to check livy session leakage.
   val YARN_APP_LEAKAGE_CHECK_INTERVAL = 
Entry("livy.server.yarn.app-leakage.check-interval", "60s")
 
+  // The size of thread pool to monitor all yarn apps.
+  val YARN_APP_MONITOR_THREAD_POOL_SIZE =
+Entry("livy.server.yarn.app-monitor.thread-pool.size", 4)
+  // How often thread monitor the YARN app.
+  val YARN_APP_MONITOR_THREAD_INTERVAL = 
Entry("livy.server.yarn.app-monitor.thread-interval", "5s")
+  // How often to check monitor thread block
+  val YARN_APP_MONITOR_THREAD_BLOCK_CHECK_INTERVAL =
+Entry("livy.server.yarn.app-monitor.thread-block.check-interval", "10s")
+  // If some thread cost more than this config to monitor one app,
+  // stop the monitor of the app which considered blocked.
+  val YARN_APP_MONITOR_THREAD_BLOCK_TIMEOUT =
+Entry("livy.server.yarn.app-monitor.thread-block-timeout", "60s")
+  // If Livy can't monitor the yarn app successfully within this max times, 
consider the app failed.
+  val YARN_APP_MONITOR_MAX_FAILED_TIMES =
+Entry("livy.server.yarn.app-monitor.max-failed.times", 12)
 
 Review comment:
   Can you please simplify the configurations, there're some many related 
configurations, which makes user hard to configure.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-livy] jerryshao commented on a change in pull request #242: [LIVY-336][SERVER] Livy should not spawn one thread per job to track the job on Yarn

2019-12-02 Thread GitBox
jerryshao commented on a change in pull request #242: [LIVY-336][SERVER] Livy 
should not spawn one thread per job to track the job on Yarn
URL: https://github.com/apache/incubator-livy/pull/242#discussion_r352973651
 
 

 ##
 File path: server/src/main/scala/org/apache/livy/LivyConf.scala
 ##
 @@ -202,11 +202,8 @@ object LivyConf {
   // Livy will cache the max no of logs specified. 0 means don't cache the 
logs.
   val SPARK_LOGS_SIZE = Entry("livy.cache-log.size", 200)
 
-  // If Livy can't find the yarn app within this time, consider it lost.
-  val YARN_APP_LOOKUP_TIMEOUT = Entry("livy.server.yarn.app-lookup-timeout", 
"120s")
-
-  // How often Livy polls YARN to refresh YARN app state.
-  val YARN_POLL_INTERVAL = Entry("livy.server.yarn.poll-interval", "5s")
 
 Review comment:
   The removal of these two configurations will break the backward 
compatibility.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-livy] huianyi edited a comment on issue #238: [LIVY-689] Deliver stage process message to the end user using thriftserver

2019-12-02 Thread GitBox
huianyi edited a comment on issue #238: [LIVY-689] Deliver stage process 
message to the end user using thriftserver
URL: https://github.com/apache/incubator-livy/pull/238#issuecomment-560985424
 
 
   @mgaido91 The previous link you pointed actually is another feature we can 
implement(process bar), you can check it here, 
[HiveStatement.java#L381](https://github.com/apache/hive/blob/209096a71b25a3d75dca1930f825c8c10b99d2b9/jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java#L381)
 , 
[BeelineInPlaceUpdateStream.java#L39](https://github.com/apache/hive/blob/209096a71b25a3d75dca1930f825c8c10b99d2b9/beeline/src/java/org/apache/hive/beeline/logs/BeelineInPlaceUpdateStream.java#L39)
   
   Like I previous said, I can support this in a new PR.
   
   > > > Beeline from Hive 2.2 has in-place progress bar 
hive.server2.in.place.progress
   > > > 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.server2.in.place.progress
   > > > (I think it works only with tez - would be nice if this would work 
with Livy too )
   > > 
   > > 
   > > Would be better to support this if it is not a big effort.
   > 
   > After reading hive related code, this needs write some new classes to 
implement this feature, which might cause this PR large, if you don not mind, 
I'd love to open a new PR to support this.
   
   ProcessBar is great in real-time interaction environment, but it can not 
work well if we do off-line scheduling jobs, which we need print the job's 
status by individual lines.
   
   > In this way, we need to store nothing on Livy server memory.
   
   By default setting, we can keep each session keep not more than 100 rows, 
actually under normal circumstances, client will get these message immediately, 
this might not put much memory pressure on Livy server side. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-livy] huianyi commented on issue #238: [LIVY-689] Deliver stage process message to the end user using thriftserver

2019-12-02 Thread GitBox
huianyi commented on issue #238: [LIVY-689] Deliver stage process message to 
the end user using thriftserver
URL: https://github.com/apache/incubator-livy/pull/238#issuecomment-560985424
 
 
   @mgaido91 The previous link you pointed actually is another feature we can 
implement(process bar), you can check it here, 
[HiveStatement.java#L381](https://github.com/apache/hive/blob/209096a71b25a3d75dca1930f825c8c10b99d2b9/jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java#L381)
 , 
[BeelineInPlaceUpdateStream.java#L39](https://github.com/apache/hive/blob/209096a71b25a3d75dca1930f825c8c10b99d2b9/beeline/src/java/org/apache/hive/beeline/logs/BeelineInPlaceUpdateStream.java#L39)
   
   Like I previous said, I can support this in a new PR.
   
   > > > Beeline from Hive 2.2 has in-place progress bar 
hive.server2.in.place.progress
   > > > 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.server2.in.place.progress
   > > > (I think it works only with tez - would be nice if this would work 
with Livy too )
   > > 
   > > 
   > > Would be better to support this if it is not a big effort.
   > 
   > After reading hive related code, this needs write some new classes to 
implement this feature, which might cause this PR large, if you don not mind, 
I'd love to open a new PR to support this.
   
   ProcessBar is great in real-time interaction environment, but it can not 
work well if we do off-line scheduling jobs, we should print the job's status 
by individual lines in this circumstances.
   
   > In this way, we need to store nothing on Livy server memory.
   
   By default setting, we can keep each session keep not more than 100 rows, 
actually under normal circumstances, client will get these message immediately, 
this might not put much memory pressure on Livy server side. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-livy] jerryshao commented on a change in pull request #165: [LIVY-581] Fix edge-case where livy overrides user-provided spark properties instead of appending

2019-12-02 Thread GitBox
jerryshao commented on a change in pull request #165: [LIVY-581] Fix edge-case 
where livy overrides user-provided spark properties instead of appending
URL: https://github.com/apache/incubator-livy/pull/165#discussion_r352971103
 
 

 ##
 File path: 
server/src/main/scala/org/apache/livy/server/interactive/InteractiveSession.scala
 ##
 @@ -332,6 +332,25 @@ object InteractiveSession extends Logging {
   
LivySparkUtils.formatSparkVersion(livyConf.get(LivyConf.LIVY_SPARK_VERSION))
 val scalaVersion = livyConf.get(LivyConf.LIVY_SPARK_SCALA_VERSION)
 
+// Detect which config we should use and merge jars and files to that 
config.
+// Please notice that if we set both spark.X config values and the 
spark.yarn.X config values
+// the spark.yarn.X config will be ignored by Spark
+val yarnFiles = conf.get(LivyConf.SPARK_YARN_DIST_FILES)
+  .map(_.split(",")).getOrElse(Array.empty[String])
+val sparkFiles = conf.get(LivyConf.SPARK_FILES)
+  .map(_.split(",")).getOrElse(Array.empty[String])
+val mergeJarsConf = if (conf.get(LivyConf.SPARK_JARS) != None ||
 
 Review comment:
   You can change to `isDefined` or `isEmpty` other than comparing `None`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-livy] jerryshao commented on a change in pull request #165: [LIVY-581] Fix edge-case where livy overrides user-provided spark properties instead of appending

2019-12-02 Thread GitBox
jerryshao commented on a change in pull request #165: [LIVY-581] Fix edge-case 
where livy overrides user-provided spark properties instead of appending
URL: https://github.com/apache/incubator-livy/pull/165#discussion_r352970867
 
 

 ##
 File path: 
server/src/main/scala/org/apache/livy/server/interactive/InteractiveSession.scala
 ##
 @@ -285,17 +285,17 @@ object InteractiveSession extends Logging {
   }
 }
 
-def mergeHiveSiteAndHiveDeps(sparkMajorVersion: Int): Unit = {
-  val sparkFiles = 
conf.get("spark.files").map(_.split(",")).getOrElse(Array.empty[String])
-  hiveSiteFile(sparkFiles, livyConf) match {
+def mergeHiveSiteAndHiveDeps(sparkMajorVersion: Int, files: Array[String],
+ mergeJarsConf: String, mergeFilesConf: 
String): Unit = {
 
 Review comment:
   Please change to "one parameter per line", thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services