Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/17416 @srowen , I finally had some time to look into this and I was able to get the correct jar on the classpath. The fix was to use the code you had in the previous commit for `SparkSubmit.addDependenciesToIvy` so that the extraAttributes is set with `dd.addDependencyArtifact` and doesn't need to be in the `ModuleRevisionId` - so it was my bad advice that probably screwed this up :< The reason is that when the `DefaultDependencyDescriptor` gets resolved in DefaultModuleDescriptor.java, if there are no artifacts defined, it adds 1 but does not copy over the `extraAttributes`, that's why the resolve report doesn't know about it. But if there are artifacts (which come from `addDependencyArtifact`) then the `extraAttributes` are carried over. wow, this is really confusing - hopefully this makes sense, see the code below `BasicResolver. getDependency(DependencyDescriptor dd, ResolveData data)` calls `DefaultModuleDescriptor.newDefaultInstance` ```java public static DefaultModuleDescriptor newDefaultInstance(ModuleRevisionId mrid, DependencyArtifactDescriptor[] artifacts) { DefaultModuleDescriptor moduleDescriptor = new DefaultModuleDescriptor(mrid, "release", null, true); moduleDescriptor.addConfiguration(new Configuration(DEFAULT_CONFIGURATION)); if (artifacts != null && artifacts.length > 0) { for (int i = 0; i < artifacts.length; i++) { moduleDescriptor.addArtifact(DEFAULT_CONFIGURATION, new MDArtifact(moduleDescriptor, artifacts[i].getName(), artifacts[i].getType(), artifacts[i].getExt(), artifacts[i].getUrl(), artifacts[i].getExtraAttributes())); } } else { moduleDescriptor.addArtifact(DEFAULT_CONFIGURATION, new MDArtifact(moduleDescriptor, mrid.getName(), "jar", "jar")); } moduleDescriptor.setLastModified(System.currentTimeMillis()); return moduleDescriptor; } ``` I think that some other code you added in the second commit was also required, which is maybe why it didn't work for you in the first place, but give it another try. Here is the output from my test, looks like it should work now: ``` bin/spark-submit --packages edu.stanford.nlp:stanford-corenlp:jar:models:3.4.1 -v examples/src/main/python/pi.py Using properties file: /home/bryan/git/spark/conf/spark-defaults.conf Adding default property: spark.history.fs.logDirectory=/home/bryan/git/spark/logs/history Adding default property: spark.eventLog.dir=/home/bryan/git/spark/logs/history Adding default property: drill.enable_unsafe_memory_access=false Warning: Ignoring non-spark config property: drill.enable_unsafe_memory_access=false Parsed arguments: master local[*] deployMode null executorMemory null executorCores null totalExecutorCores null propertiesFile /home/bryan/git/spark/conf/spark-defaults.conf driverMemory null driverCores null driverExtraClassPath null driverExtraLibraryPath null driverExtraJavaOptions null supervise false queue null numExecutors null files null pyFiles null archives null mainClass null primaryResource file:/home/bryan/git/spark/examples/src/main/python/pi.py name pi.py childArgs [] jars null packages edu.stanford.nlp:stanford-corenlp:jar:models:3.4.1 packagesExclusions null repositories null verbose true Spark properties used, including those specified through --conf and those from the properties file /home/bryan/git/spark/conf/spark-defaults.conf: (spark.history.fs.logDirectory,/home/bryan/git/spark/logs/history) (spark.eventLog.dir,/home/bryan/git/spark/logs/history) Ivy Default Cache set to: /home/bryan/.ivy2/cache The jars for the packages stored in: /home/bryan/.ivy2/jars :: loading settings :: url = jar:file:/home/bryan/git/spark/assembly/target/scala-2.11/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml edu.stanford.nlp#stanford-corenlp added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default] found edu.stanford.nlp#stanford-corenlp;3.4.1 in central downloading https://repo1.maven.org/maven2/edu/stanford/nlp/stanford-corenlp/3.4.1/stanford-corenlp-3.4.1-models.jar ... [SUCCESSFUL ] edu.stanford.nlp#stanford-corenlp;3.4.1!stanford-corenlp.jar (118730ms) :: resolution report :: resolve 1164ms :: artifacts dl 118732ms :: modules in use: edu.stanford.nlp#stanford-corenlp;3.4.1 from central in [default] --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | default | 1 | 1 | 0 | 0 || 1 | 1 | --------------------------------------------------------------------- :: retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0 artifacts copied, 1 already retrieved (0kB/5ms) Main class: org.apache.spark.deploy.PythonRunner Arguments: file:/home/bryan/git/spark/examples/src/main/python/pi.py /home/bryan/.ivy2/jars/edu.stanford.nlp_stanford-corenlp-models-3.4.1.jar System properties: (SPARK_SUBMIT,true) (spark.submit.pyFiles,/home/bryan/.ivy2/jars/edu.stanford.nlp_stanford-corenlp-models-3.4.1.jar) (spark.history.fs.logDirectory,/home/bryan/git/spark/logs/history) (spark.files,file:/home/bryan/git/spark/examples/src/main/python/pi.py,file:/home/bryan/.ivy2/jars/edu.stanford.nlp_stanford-corenlp-models-3.4.1.jar) (spark.app.name,pi.py) (spark.jars,file:/home/bryan/.ivy2/jars/edu.stanford.nlp_stanford-corenlp-models-3.4.1.jar) (spark.submit.deployMode,client) (spark.eventLog.dir,/home/bryan/git/spark/logs/history) (spark.master,local[*]) Classpath elements: /home/bryan/.ivy2/jars/edu.stanford.nlp_stanford-corenlp-models-3.4.1.jar ```
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org