[jira] [Commented] (SPARK-23644) SHS with proxy doesn't show applications
[ https://issues.apache.org/jira/browse/SPARK-23644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16401538#comment-16401538 ] Saisai Shao commented on SPARK-23644: - Fixed in https://github.com/apache/spark/pull/20794. > SHS with proxy doesn't show applications > > > Key: SPARK-23644 > URL: https://issues.apache.org/jira/browse/SPARK-23644 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Web UI >Affects Versions: 2.3.0 >Reporter: Marco Gaido >Assignee: Marco Gaido >Priority: Minor > Fix For: 2.4.0 > > > The History server supports being consumed via a proxy using the > {{spark.ui.proxyBase}} property. Despite it works fine if you access the > proxy using a link which ends with "/", it doesn't show any application if > the URL accessed doesn't end with "/", eg. if you access SHS using > {{https://yourproxy.whatever:1234/path/to/historyserver/}} it works fine, but > it you access it using > {{https://yourproxy.whatever:1234/path/to/historyserver}} no application is > shown. > The cause of this is that the call to the REST API to get the list of the > application is a relative path call. So in the second case, instead of > performing a GET to > {{https://yourproxy.whatever:1234/path/to/historyserver/api/v1/applications}}, > it performs a call to > {{https://yourproxy.whatever:1234/path/to/api/v1/applications}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-23644) SHS with proxy doesn't show applications
[ https://issues.apache.org/jira/browse/SPARK-23644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved SPARK-23644. - Resolution: Fixed Assignee: Marco Gaido Fix Version/s: 2.4.0 > SHS with proxy doesn't show applications > > > Key: SPARK-23644 > URL: https://issues.apache.org/jira/browse/SPARK-23644 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Web UI >Affects Versions: 2.3.0 >Reporter: Marco Gaido >Assignee: Marco Gaido >Priority: Minor > Fix For: 2.4.0 > > > The History server supports being consumed via a proxy using the > {{spark.ui.proxyBase}} property. Despite it works fine if you access the > proxy using a link which ends with "/", it doesn't show any application if > the URL accessed doesn't end with "/", eg. if you access SHS using > {{https://yourproxy.whatever:1234/path/to/historyserver/}} it works fine, but > it you access it using > {{https://yourproxy.whatever:1234/path/to/historyserver}} no application is > shown. > The cause of this is that the call to the REST API to get the list of the > application is a relative path call. So in the second case, instead of > performing a GET to > {{https://yourproxy.whatever:1234/path/to/historyserver/api/v1/applications}}, > it performs a call to > {{https://yourproxy.whatever:1234/path/to/api/v1/applications}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23650) Slow SparkR udf (dapply)
[ https://issues.apache.org/jira/browse/SPARK-23650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16401525#comment-16401525 ] Felix Cheung edited comment on SPARK-23650 at 3/16/18 7:16 AM: --- do you mean this? RRunner: Times: boot = 0.010 s, init = 0.005 s, broadcast = 1.894 s, read-input = 0.001 s, compute = 0.062 s, write-output = 0.002 s, total = 1.974 s Under the cover it is working with the same R process. I see SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1006 each time it is creating a new broadcast which would then needs to be transferred. IMO there are a few things to look into: # it should detect if the broadcast is the same (not sure if it does that) # if it is attributed to the same broadcast in use.daemon mode then it perhaps doesn't have to transfer it again (but it would need to keep track of the stage executed before and broadcast that was sent before etc) # data transfer can be faster (SPARK-18924) However, as of now RRunner simply picks up the broadcast that is pass to it and sends it. was (Author: felixcheung): do you mean this? RRunner: Times: boot = 0.010 s, init = 0.005 s, broadcast = 1.894 s, read-input = 0.001 s, compute = 0.062 s, write-output = 0.002 s, total = 1.974 s Under the cover it is working with the same R process. I see SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1006 each time it is creating a new broadcast which would then needs to be transferred. IMO there are a few things to look into: # it should detect if the broadcast is the same (not sure if it does that) # if it is attributed to the same broadcast in use.daemon mode then it perhaps doesn't have to transfer it again # data transfer can be faster (SPARK-18924) However, as of now RRunner simply picks up the broadcast that is pass to it and sends it. > Slow SparkR udf (dapply) > > > Key: SPARK-23650 > URL: https://issues.apache.org/jira/browse/SPARK-23650 > Project: Spark > Issue Type: Improvement > Components: Spark Shell, SparkR, Structured Streaming >Affects Versions: 2.2.0 >Reporter: Deepansh >Priority: Major > Attachments: sparkR_log2.txt, sparkRlag.txt > > > For eg, I am getting streams from Kafka and I want to implement a model made > in R for those streams. For this, I am using dapply. > My code is: > iris_model <- readRDS("./iris_model.rds") > randomBr <- SparkR:::broadcast(sc, iris_model) > kafka <- read.stream("kafka",subscribe = "source", kafka.bootstrap.servers = > "localhost:9092", topic = "source") > lines<- select(kafka, cast(kafka$value, "string")) > schema<-schema(lines) > df1<-dapply(lines,function(x){ > i_model<-SparkR:::value(randomMatBr) > for (row in 1:nrow(x)) > { y<-fromJSON(as.character(x[row,"value"])) y$predict=predict(i_model,y) > y<-toJSON(y) x[row,"value"] = y } > x > },schema) > Every time when Kafka streams are fetched the dapply method creates new > runner thread and ships the variables again, which causes a huge lag(~2s for > shipping model) every time. I even tried without broadcast variables but it > takes same time to ship variables. Can some other techniques be applied to > improve its performance? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23650) Slow SparkR udf (dapply)
[ https://issues.apache.org/jira/browse/SPARK-23650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16401525#comment-16401525 ] Felix Cheung commented on SPARK-23650: -- do you mean this? RRunner: Times: boot = 0.010 s, init = 0.005 s, broadcast = 1.894 s, read-input = 0.001 s, compute = 0.062 s, write-output = 0.002 s, total = 1.974 s Under the cover it is working with the same R process. I see SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1006 each time it is creating a new broadcast which would then needs to be transferred. IMO there are a few things to look into: # it should detect if the broadcast is the same (not sure if it does that) # if it is attributed to the same broadcast in use.daemon mode then it perhaps doesn't have to transfer it again # data transfer can be faster (SPARK-18924) However, as of now RRunner simply picks up the broadcast that is pass to it and sends it. > Slow SparkR udf (dapply) > > > Key: SPARK-23650 > URL: https://issues.apache.org/jira/browse/SPARK-23650 > Project: Spark > Issue Type: Improvement > Components: Spark Shell, SparkR, Structured Streaming >Affects Versions: 2.2.0 >Reporter: Deepansh >Priority: Major > Attachments: sparkR_log2.txt, sparkRlag.txt > > > For eg, I am getting streams from Kafka and I want to implement a model made > in R for those streams. For this, I am using dapply. > My code is: > iris_model <- readRDS("./iris_model.rds") > randomBr <- SparkR:::broadcast(sc, iris_model) > kafka <- read.stream("kafka",subscribe = "source", kafka.bootstrap.servers = > "localhost:9092", topic = "source") > lines<- select(kafka, cast(kafka$value, "string")) > schema<-schema(lines) > df1<-dapply(lines,function(x){ > i_model<-SparkR:::value(randomMatBr) > for (row in 1:nrow(x)) > { y<-fromJSON(as.character(x[row,"value"])) y$predict=predict(i_model,y) > y<-toJSON(y) x[row,"value"] = y } > x > },schema) > Every time when Kafka streams are fetched the dapply method creates new > runner thread and ships the variables again, which causes a huge lag(~2s for > shipping model) every time. I even tried without broadcast variables but it > takes same time to ship variables. Can some other techniques be applied to > improve its performance? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23694) The staging directory should under hive.exec.stagingdir if we set hive.exec.stagingdir but not under the table directory
[ https://issues.apache.org/jira/browse/SPARK-23694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16401490#comment-16401490 ] Apache Spark commented on SPARK-23694: -- User 'dyf6372' has created a pull request for this issue: https://github.com/apache/spark/pull/20843 > The staging directory should under hive.exec.stagingdir if we set > hive.exec.stagingdir but not under the table directory > - > > Key: SPARK-23694 > URL: https://issues.apache.org/jira/browse/SPARK-23694 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Yifeng Dong >Priority: Major > > When we set hive.exec.stagingdir but not under the table directory, for > example: /tmp/hive-staging, I think the staging directory should under > /tmp/hive-staging, not under /tmp/ like /tmp/hive-staging_xxx -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23694) The staging directory should under hive.exec.stagingdir if we set hive.exec.stagingdir but not under the table directory
[ https://issues.apache.org/jira/browse/SPARK-23694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifeng Dong updated SPARK-23694: External issue URL: https://github.com/apache/spark/pull/20843 > The staging directory should under hive.exec.stagingdir if we set > hive.exec.stagingdir but not under the table directory > - > > Key: SPARK-23694 > URL: https://issues.apache.org/jira/browse/SPARK-23694 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Yifeng Dong >Priority: Major > > When we set hive.exec.stagingdir but not under the table directory, for > example: /tmp/hive-staging, I think the staging directory should under > /tmp/hive-staging, not under /tmp/ like /tmp/hive-staging_xxx -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-23694) The staging directory should under hive.exec.stagingdir if we set hive.exec.stagingdir but not under the table directory
[ https://issues.apache.org/jira/browse/SPARK-23694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23694: Assignee: (was: Apache Spark) > The staging directory should under hive.exec.stagingdir if we set > hive.exec.stagingdir but not under the table directory > - > > Key: SPARK-23694 > URL: https://issues.apache.org/jira/browse/SPARK-23694 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Yifeng Dong >Priority: Major > > When we set hive.exec.stagingdir but not under the table directory, for > example: /tmp/hive-staging, I think the staging directory should under > /tmp/hive-staging, not under /tmp/ like /tmp/hive-staging_xxx -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-23694) The staging directory should under hive.exec.stagingdir if we set hive.exec.stagingdir but not under the table directory
[ https://issues.apache.org/jira/browse/SPARK-23694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23694: Assignee: Apache Spark > The staging directory should under hive.exec.stagingdir if we set > hive.exec.stagingdir but not under the table directory > - > > Key: SPARK-23694 > URL: https://issues.apache.org/jira/browse/SPARK-23694 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Yifeng Dong >Assignee: Apache Spark >Priority: Major > > When we set hive.exec.stagingdir but not under the table directory, for > example: /tmp/hive-staging, I think the staging directory should under > /tmp/hive-staging, not under /tmp/ like /tmp/hive-staging_xxx -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org