[jira] [Updated] (LIVY-750) Livy uploads local pyspark archives to Yarn distributed cache

2020-03-16 Thread shanyu zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/LIVY-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated LIVY-750:
-
Description: 
On Livy Server, even if we set  pyspark archives to use local files:
{code:bash}
export 
PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip
{code}

Livy still upload these local pyspark archives to Yarn distributed cache:
20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,026 INFO 
yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip -> 
hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip
20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,392 INFO 
yarn.Client: Uploading resource file:/opt/spark/python/lib/py4j-0.10.7-src.zip 
-> 
hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/py4j-0.10.7-src.zip

Note that this is after we fixed Spark code in SPARK-30845 to not always upload 
local archives.

The root cause is that Livy adds pyspark archives to "spark.submit.pyFiles", 
which will be added to Yarn distributed cache by Spark. Since spark-submit 
already takes care of finding and uploading pyspark archives if it is not 
local, there is no need for Livy to redundantly do so.

  was:
On Livy Server, even if we set  pyspark archives to use local files:
{code:bash}
export 
PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip
{code}

Livy still upload these local pyspark archives to Yarn distributed cache:
20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,026 INFO 
yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip -> 
hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip
20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,392 INFO 
yarn.Client: Uploading resource file:/opt/spark/python/lib/py4j-0.10.7-src.zip 
-> 
hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/py4j-0.10.7-src.zip

Note that this is after we fixed Spark code in SPARK-30845 to not always upload 
local archives.

The root cause is that Livy adds pyspark archives to "spark.submit.pyFiles", 
which will be added to Yarn distributed cache by Spark. Since spark-submit 
already takes care of uploading pyspark archives, there is no need for Livy to 
redundantly do so.


> Livy uploads local pyspark archives to Yarn distributed cache
> -
>
> Key: LIVY-750
> URL: https://issues.apache.org/jira/browse/LIVY-750
> Project: Livy
>  Issue Type: Bug
>  Components: Server
>Affects Versions: 0.6.0, 0.7.0
>Reporter: shanyu zhao
>Priority: Major
> Attachments: image-2020-02-16-13-19-40-645.png, 
> image-2020-02-16-13-19-59-591.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> On Livy Server, even if we set  pyspark archives to use local files:
> {code:bash}
> export 
> PYSPARK_ARCHIVES_PATH=local:/opt/spark/python/lib/pyspark.zip,local:/opt/spark/python/lib/py4j-0.10.7-src.zip
> {code}
> Livy still upload these local pyspark archives to Yarn distributed cache:
> 20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,026 INFO 
> yarn.Client: Uploading resource file:/opt/spark/python/lib/pyspark.zip -> 
> hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/pyspark.zip
> 20/02/14 20:05:46 INFO utils.LineBufferedStream: 2020-02-14 20:05:46,392 INFO 
> yarn.Client: Uploading resource 
> file:/opt/spark/python/lib/py4j-0.10.7-src.zip -> 
> hdfs://mycluster/user/test1/.sparkStaging/application_1581024490249_0001/py4j-0.10.7-src.zip
> Note that this is after we fixed Spark code in SPARK-30845 to not always 
> upload local archives.
> The root cause is that Livy adds pyspark archives to "spark.submit.pyFiles", 
> which will be added to Yarn distributed cache by Spark. Since spark-submit 
> already takes care of finding and uploading pyspark archives if it is not 
> local, there is no need for Livy to redundantly do so.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (LIVY-752) Livy TS does not accept any connections when limits are set on connections

2020-03-16 Thread Marco Gaido (Jira)


 [ 
https://issues.apache.org/jira/browse/LIVY-752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Gaido resolved LIVY-752.
--
Fix Version/s: 0.8.0
 Assignee: Wing Yew Poon
   Resolution: Fixed

Issue resolved by PR: [https://github.com/apache/incubator-livy/pull/284].

> Livy TS does not accept any connections when limits are set on connections
> --
>
> Key: LIVY-752
> URL: https://issues.apache.org/jira/browse/LIVY-752
> Project: Livy
>  Issue Type: Bug
>  Components: Thriftserver
>Affects Versions: 0.7.0
>Reporter: Wing Yew Poon
>Assignee: Wing Yew Poon
>Priority: Major
> Fix For: 0.8.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I set livy.server.thrift.limit.connections.per.user=20 on my Livy Server. 
> When I try to connect to it, I get
> {noformat}
> 2020-02-28 17:13:30,443 WARN 
> org.apache.livy.thriftserver.cli.ThriftBinaryCLIService: Error opening 
> session: 
> java.lang.NullPointerException
>   at 
> org.apache.livy.thriftserver.LivyThriftSessionManager.incrementConnectionsCount(LivyThriftSessionManager.scala:438)
>   at 
> org.apache.livy.thriftserver.LivyThriftSessionManager.incrementConnections(LivyThriftSessionManager.scala:425)
>   at 
> org.apache.livy.thriftserver.LivyThriftSessionManager.openSession(LivyThriftSessionManager.scala:222)
>   at 
> org.apache.livy.thriftserver.LivyCLIService.openSessionWithImpersonation(LivyCLIService.scala:121)
>   at 
> org.apache.livy.thriftserver.cli.ThriftCLIService.getSessionHandle(ThriftCLIService.scala:324)
>   at 
> org.apache.livy.thriftserver.cli.ThriftCLIService.OpenSession(ThriftCLIService.scala:203)
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1497)
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$OpenSession.getResult(TCLIService.java:1482)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)