[jira] [Created] (ZEPPELIN-4934) Add jars to sys.path

2020-07-02 Thread Dalitso Banda (Jira)
Dalitso Banda created ZEPPELIN-4934:
---

 Summary: Add jars to sys.path
 Key: ZEPPELIN-4934
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-4934
 Project: Zeppelin
  Issue Type: Bug
  Components: pySpark, spark
Affects Versions: 0.9.0
 Environment: Distributor ID: Ubuntu
Description: Ubuntu 18.04.1 LTS
Release: 18.04
Codename: bionic

spark 2.4.6

hadoop 3.1.3
Reporter: Dalitso Banda


Packages (jars) added through spark.jars.packages do not appear in the pyspark 
interpreter sys.path. This is different from the behavior of CLI shell pyspark  
2.4 - see example below. This leads to import not found errors for jars that 
have their python code included such as 
com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1

pyspark include these jars to the path at context initialization. From the 
spark repo see:
 * core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
 * python/pyspark/context.py
 * python/pyspark/shell.py

In pyspark, the jars passed with "–packages" are passed onto 
"spark.submit.pyFiles" (in prepareSubmitEnvironment function) and then added to 
sys.path by the context initialization.

A simple fix is to do:
{quote}import sys

sys.path.extend(sc.getConf().get("spark.jars").split(","))
{quote}
 at the top of every notebook. However, this is a little cumbersome and 
unintuitive to users who expect the same spark behavior. 

 

Behavior in pyspark CLI:

```

pyspark --packages com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1 
--repositories https://mmlspark.azureedge.net/maven --master local[*]


Python 3.6.9 (default, Oct 17 2019, 11:10:22)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
https://mmlspark.azureedge.net/maven added as a remote repository with the 
name: repo-1
Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = 
jar:file:/opt/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.microsoft.ml.spark#mmlspark_2.11 added as a dependency
:: resolving dependencies :: 
org.apache.spark#spark-submit-parent-26440d6b-a15d-40e0-8225-c9eb1fd50ac9;1.0
 confs: [default]
 found com.microsoft.ml.spark#mmlspark_2.11;1.0.0-rc1 in repo-1
 found org.scalactic#scalactic_2.11;3.0.5 in central
 found org.scala-lang#scala-reflect;2.11.12 in central
 found org.scalatest#scalatest_2.11;3.0.5 in central
 found org.scala-lang.modules#scala-xml_2.11;1.0.6 in central
 found io.spray#spray-json_2.11;1.3.2 in central
 found com.microsoft.cntk#cntk;2.4 in central
 found org.openpnp#opencv;3.2.0-1 in central
 found com.jcraft#jsch;0.1.54 in central
 found org.apache.httpcomponents#httpclient;4.5.6 in central
 found org.apache.httpcomponents#httpcore;4.4.10 in central
 found commons-logging#commons-logging;1.2 in central
 found commons-codec#commons-codec;1.10 in central
 found com.microsoft.ml.lightgbm#lightgbmlib;2.3.100 in central
 found com.github.vowpalwabbit#vw-jni;8.7.0.3 in central
:: resolution report :: resolve 462ms :: artifacts dl 9ms
 :: modules in use:
 com.github.vowpalwabbit#vw-jni;8.7.0.3 from central in [default]
 com.jcraft#jsch;0.1.54 from central in [default]
 com.microsoft.cntk#cntk;2.4 from central in [default]
 com.microsoft.ml.lightgbm#lightgbmlib;2.3.100 from central in [default]
 com.microsoft.ml.spark#mmlspark_2.11;1.0.0-rc1 from repo-1 in [default]
 commons-codec#commons-codec;1.10 from central in [default]
 commons-logging#commons-logging;1.2 from central in [default]
 io.spray#spray-json_2.11;1.3.2 from central in [default]
 org.apache.httpcomponents#httpclient;4.5.6 from central in [default]
 org.apache.httpcomponents#httpcore;4.4.10 from central in [default]
 org.openpnp#opencv;3.2.0-1 from central in [default]
 org.scala-lang#scala-reflect;2.11.12 from central in [default]
 org.scala-lang.modules#scala-xml_2.11;1.0.6 from central in [default]
 org.scalactic#scalactic_2.11;3.0.5 from central in [default]
 org.scalatest#scalatest_2.11;3.0.5 from central in [default]
 -
 | | modules || artifacts |
 | conf | number| search|dwnlded|evicted|| number|dwnlded|
 -
 | default | 15 | 0 | 0 | 0 || 15 | 0 |
 -
:: retrieving :: 
org.apache.spark#spark-submit-parent-26440d6b-a15d-40e0-8225-c9eb1fd50ac9
 confs: [default]
 0 artifacts copied, 15 already retrieved (0kB/13ms)
20/07/01 23:36:42 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
Welcome to
  __
 / __/__ ___ _/ /__
 _\ \/ _ \/ _ `/ __/ '_/
 /__ / 

[jira] [Created] (ZEPPELIN-4156) Some packages cannot be downloaded form twttr mavnen

2019-05-13 Thread Dalitso Banda (JIRA)
Dalitso Banda created ZEPPELIN-4156:
---

 Summary: Some packages cannot be downloaded form twttr mavnen
 Key: ZEPPELIN-4156
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-4156
 Project: Zeppelin
  Issue Type: Bug
  Components: zeppelin-interpreter
Reporter: Dalitso Banda


SBT fails to resolve dependencies from twittr maven. The HTTP is now redirected 
to HTTPS.
see [sbt/sbt#3670|https://github.com/sbt/sbt/issues/3670]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)