Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/17416
  
    @srowen , I finally had some time to look into this and I was able to get 
the correct jar on the classpath.  The fix was to use the code you had in the 
previous commit for `SparkSubmit.addDependenciesToIvy` so that the 
extraAttributes is set with `dd.addDependencyArtifact` and doesn't need to be 
in the `ModuleRevisionId` - so it was my bad advice that probably screwed this 
up :<
    
    The reason is that when the `DefaultDependencyDescriptor` gets resolved in 
DefaultModuleDescriptor.java, if there are no artifacts defined, it adds 1 but 
does not copy over the `extraAttributes`, that's why the resolve report doesn't 
know about it.  But if there are artifacts (which come from 
`addDependencyArtifact`) then the `extraAttributes` are carried over.  wow, 
this is really confusing - hopefully this makes sense, see the code below 
`BasicResolver. getDependency(DependencyDescriptor dd, ResolveData data)` calls 
`DefaultModuleDescriptor.newDefaultInstance`
    
    ```java
    public static DefaultModuleDescriptor newDefaultInstance(ModuleRevisionId 
mrid,
                DependencyArtifactDescriptor[] artifacts) {
            DefaultModuleDescriptor moduleDescriptor = new 
DefaultModuleDescriptor(mrid, "release",
                    null, true);
            moduleDescriptor.addConfiguration(new 
Configuration(DEFAULT_CONFIGURATION));
            if (artifacts != null && artifacts.length > 0) {
                for (int i = 0; i < artifacts.length; i++) {
                    moduleDescriptor.addArtifact(DEFAULT_CONFIGURATION,
                        new MDArtifact(moduleDescriptor, artifacts[i].getName(),
                                artifacts[i].getType(), artifacts[i].getExt(), 
artifacts[i].getUrl(),
                                artifacts[i].getExtraAttributes()));
                }
            } else {
                moduleDescriptor.addArtifact(DEFAULT_CONFIGURATION, new 
MDArtifact(moduleDescriptor,
                        mrid.getName(), "jar", "jar"));
            }
            moduleDescriptor.setLastModified(System.currentTimeMillis());
            return moduleDescriptor;
        }
    ```
    
    I think that some other code you added in the second commit was also 
required, which is maybe why it didn't work for you in the first place, but 
give it another try.  Here is the output from my test, looks like it should 
work now:
    
    ```
    bin/spark-submit --packages 
edu.stanford.nlp:stanford-corenlp:jar:models:3.4.1 -v 
examples/src/main/python/pi.py
    Using properties file: /home/bryan/git/spark/conf/spark-defaults.conf
    Adding default property: 
spark.history.fs.logDirectory=/home/bryan/git/spark/logs/history
    Adding default property: 
spark.eventLog.dir=/home/bryan/git/spark/logs/history
    Adding default property: drill.enable_unsafe_memory_access=false
    Warning: Ignoring non-spark config property: 
drill.enable_unsafe_memory_access=false
    Parsed arguments:
      master                  local[*]
      deployMode              null
      executorMemory          null
      executorCores           null
      totalExecutorCores      null
      propertiesFile          /home/bryan/git/spark/conf/spark-defaults.conf
      driverMemory            null
      driverCores             null
      driverExtraClassPath    null
      driverExtraLibraryPath  null
      driverExtraJavaOptions  null
      supervise               false
      queue                   null
      numExecutors            null
      files                   null
      pyFiles                 null
      archives                null
      mainClass               null
      primaryResource         
file:/home/bryan/git/spark/examples/src/main/python/pi.py
      name                    pi.py
      childArgs               []
      jars                    null
      packages                edu.stanford.nlp:stanford-corenlp:jar:models:3.4.1
      packagesExclusions      null
      repositories            null
      verbose                 true
    
    Spark properties used, including those specified through
     --conf and those from the properties file 
/home/bryan/git/spark/conf/spark-defaults.conf:
      (spark.history.fs.logDirectory,/home/bryan/git/spark/logs/history)
      (spark.eventLog.dir,/home/bryan/git/spark/logs/history)
    
        
    Ivy Default Cache set to: /home/bryan/.ivy2/cache
    The jars for the packages stored in: /home/bryan/.ivy2/jars
    :: loading settings :: url = 
jar:file:/home/bryan/git/spark/assembly/target/scala-2.11/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
    edu.stanford.nlp#stanford-corenlp added as a dependency
    :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
        confs: [default]
        found edu.stanford.nlp#stanford-corenlp;3.4.1 in central
    downloading 
https://repo1.maven.org/maven2/edu/stanford/nlp/stanford-corenlp/3.4.1/stanford-corenlp-3.4.1-models.jar
 ...
        [SUCCESSFUL ] 
edu.stanford.nlp#stanford-corenlp;3.4.1!stanford-corenlp.jar (118730ms)
    :: resolution report :: resolve 1164ms :: artifacts dl 118732ms
        :: modules in use:
        edu.stanford.nlp#stanford-corenlp;3.4.1 from central in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   1   |   1   |   0   |   0   ||   1   |   1   |
        ---------------------------------------------------------------------
    :: retrieving :: org.apache.spark#spark-submit-parent
        confs: [default]
        0 artifacts copied, 1 already retrieved (0kB/5ms)
    Main class:
    org.apache.spark.deploy.PythonRunner
    Arguments:
    file:/home/bryan/git/spark/examples/src/main/python/pi.py
    /home/bryan/.ivy2/jars/edu.stanford.nlp_stanford-corenlp-models-3.4.1.jar
    System properties:
    (SPARK_SUBMIT,true)
    
(spark.submit.pyFiles,/home/bryan/.ivy2/jars/edu.stanford.nlp_stanford-corenlp-models-3.4.1.jar)
    (spark.history.fs.logDirectory,/home/bryan/git/spark/logs/history)
    
(spark.files,file:/home/bryan/git/spark/examples/src/main/python/pi.py,file:/home/bryan/.ivy2/jars/edu.stanford.nlp_stanford-corenlp-models-3.4.1.jar)
    (spark.app.name,pi.py)
    
(spark.jars,file:/home/bryan/.ivy2/jars/edu.stanford.nlp_stanford-corenlp-models-3.4.1.jar)
    (spark.submit.deployMode,client)
    (spark.eventLog.dir,/home/bryan/git/spark/logs/history)
    (spark.master,local[*])
    Classpath elements:
    /home/bryan/.ivy2/jars/edu.stanford.nlp_stanford-corenlp-models-3.4.1.jar
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to