[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies
[ https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239927#comment-15239927 ] Steve Loughran commented on SPARK-11157: thanks. Although it's not directly a YARN bug, where things can help there is often improving reporting of launch failures (YARN-522 ). > Allow Spark to be built without assemblies > -- > > Key: SPARK-11157 > URL: https://issues.apache.org/jira/browse/SPARK-11157 > Project: Spark > Issue Type: Umbrella > Components: Build, Spark Core, YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Fix For: 2.0.0 > > Attachments: no-assemblies.pdf > > > For reasoning, discussion of pros and cons, and other more detailed > information, please see attached doc. > The idea is to be able to build a Spark distribution that has just a > directory full of jars instead of the huge assembly files we currently have. > Getting there requires changes in a bunch of places, I'll try to list the > ones I identified in the document, in the order that I think would be needed > to not break things: > * make streaming backends not be assemblies > Since people may depend on the current assembly artifacts in their > deployments, we can't really remove them; but we can make them be dummy jars > and rely on dependency resolution to download all the jars. > PySpark tests would also need some tweaking here. > * make examples jar not be an assembly > Probably requires tweaks to the {{run-example}} script. The location of the > examples jar would have to change (it won't be able to live in the same place > as the main Spark jars anymore). > * update YARN backend to handle a directory full of jars when launching apps > Currently YARN localizes the Spark assembly (depending on the user > configuration); it needs to be modified so that it can localize all needed > libraries instead of a single jar. > * Modify launcher library to handle the jars directory > This should be trivial > * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory > depending on which profile is enabled. > We should keep the option to build with the assembly on by default, for > backwards compatibility, to give people time to prepare. > Filing this bug as an umbrella; please file sub-tasks if you plan to work on > a specific part of the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies
[ https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239708#comment-15239708 ] Sebastian Kochman commented on SPARK-11157: --- I have filed a Spark bug: https://issues.apache.org/jira/browse/SPARK-14602 I haven't filed a YARN bug, per your suggestion Steve -- I'm not convinced it's a bug in YARN itself. Let's discuss further in the JIRA above. > Allow Spark to be built without assemblies > -- > > Key: SPARK-11157 > URL: https://issues.apache.org/jira/browse/SPARK-11157 > Project: Spark > Issue Type: Umbrella > Components: Build, Spark Core, YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Fix For: 2.0.0 > > Attachments: no-assemblies.pdf > > > For reasoning, discussion of pros and cons, and other more detailed > information, please see attached doc. > The idea is to be able to build a Spark distribution that has just a > directory full of jars instead of the huge assembly files we currently have. > Getting there requires changes in a bunch of places, I'll try to list the > ones I identified in the document, in the order that I think would be needed > to not break things: > * make streaming backends not be assemblies > Since people may depend on the current assembly artifacts in their > deployments, we can't really remove them; but we can make them be dummy jars > and rely on dependency resolution to download all the jars. > PySpark tests would also need some tweaking here. > * make examples jar not be an assembly > Probably requires tweaks to the {{run-example}} script. The location of the > examples jar would have to change (it won't be able to live in the same place > as the main Spark jars anymore). > * update YARN backend to handle a directory full of jars when launching apps > Currently YARN localizes the Spark assembly (depending on the user > configuration); it needs to be modified so that it can localize all needed > libraries instead of a single jar. > * Modify launcher library to handle the jars directory > This should be trivial > * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory > depending on which profile is enabled. > We should keep the option to build with the assembly on by default, for > backwards compatibility, to give people time to prepare. > Filing this bug as an umbrella; please file sub-tasks if you plan to work on > a specific part of the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies
[ https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239042#comment-15239042 ] Steve Loughran commented on SPARK-11157: file a spark bug and if it needs escalation to YARN, one can be hooked in there. What's clearly surfacing is the limit in the line length being exceeeded in the {{SPARK_YARN_CACHE_FILES}} env var; this is set up at launch in {{ClientDistributedCacheManager}}, picked up in {{ExecutorRunnable}} I haven't seen this in YARN before, but hit the problem in other apps (hello, ant {{}} task). The general strategy is: save the data to a file, use the environment variable to point to a file, rather than set it on the CLI It'll be slightly complex with the YARN launch: any such file will have to be localized. If a simple java.io.Property file is used it's trivial to work with, and sets the process up for all env vars related to the cache files and {{SPARK_YARN_CACHE_ARCHIVES}}. > Allow Spark to be built without assemblies > -- > > Key: SPARK-11157 > URL: https://issues.apache.org/jira/browse/SPARK-11157 > Project: Spark > Issue Type: Umbrella > Components: Build, Spark Core, YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Fix For: 2.0.0 > > Attachments: no-assemblies.pdf > > > For reasoning, discussion of pros and cons, and other more detailed > information, please see attached doc. > The idea is to be able to build a Spark distribution that has just a > directory full of jars instead of the huge assembly files we currently have. > Getting there requires changes in a bunch of places, I'll try to list the > ones I identified in the document, in the order that I think would be needed > to not break things: > * make streaming backends not be assemblies > Since people may depend on the current assembly artifacts in their > deployments, we can't really remove them; but we can make them be dummy jars > and rely on dependency resolution to download all the jars. > PySpark tests would also need some tweaking here. > * make examples jar not be an assembly > Probably requires tweaks to the {{run-example}} script. The location of the > examples jar would have to change (it won't be able to live in the same place > as the main Spark jars anymore). > * update YARN backend to handle a directory full of jars when launching apps > Currently YARN localizes the Spark assembly (depending on the user > configuration); it needs to be modified so that it can localize all needed > libraries instead of a single jar. > * Modify launcher library to handle the jars directory > This should be trivial > * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory > depending on which profile is enabled. > We should keep the option to build with the assembly on by default, for > backwards compatibility, to give people time to prepare. > Filing this bug as an umbrella; please file sub-tasks if you plan to work on > a specific part of the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies
[ https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238151#comment-15238151 ] Sebastian Kochman commented on SPARK-11157: --- After this change, when I try to submit a Spark app to YARN on Windows (using spark-submit.cmd), the app fails with the following error: Diagnostics: The command line has a length of 12046 exceeds maximum allowed length of 8191. Command starts with: @set SPARK_YARN_CACHE_FILES=[...]/.sparkStaging/application_1460496865345_0 Failing this attempt. Failing the application. So basically a large number of jars needed for staging in YARN causes exceeding Windows command line length limit. Has anybody seen this? Is there a recommendation for a workaround? Marcelo: in the original description, you mentioned there will still be a Maven profile building a single assembly. I couldn't find it -- is there one? > Allow Spark to be built without assemblies > -- > > Key: SPARK-11157 > URL: https://issues.apache.org/jira/browse/SPARK-11157 > Project: Spark > Issue Type: Umbrella > Components: Build, Spark Core, YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Fix For: 2.0.0 > > Attachments: no-assemblies.pdf > > > For reasoning, discussion of pros and cons, and other more detailed > information, please see attached doc. > The idea is to be able to build a Spark distribution that has just a > directory full of jars instead of the huge assembly files we currently have. > Getting there requires changes in a bunch of places, I'll try to list the > ones I identified in the document, in the order that I think would be needed > to not break things: > * make streaming backends not be assemblies > Since people may depend on the current assembly artifacts in their > deployments, we can't really remove them; but we can make them be dummy jars > and rely on dependency resolution to download all the jars. > PySpark tests would also need some tweaking here. > * make examples jar not be an assembly > Probably requires tweaks to the {{run-example}} script. The location of the > examples jar would have to change (it won't be able to live in the same place > as the main Spark jars anymore). > * update YARN backend to handle a directory full of jars when launching apps > Currently YARN localizes the Spark assembly (depending on the user > configuration); it needs to be modified so that it can localize all needed > libraries instead of a single jar. > * Modify launcher library to handle the jars directory > This should be trivial > * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory > depending on which profile is enabled. > We should keep the option to build with the assembly on by default, for > backwards compatibility, to give people time to prepare. > Filing this bug as an umbrella; please file sub-tasks if you plan to work on > a specific part of the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies
[ https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230157#comment-15230157 ] Sean Owen commented on SPARK-11157: --- You just need the launcher module, right? you don't need to bundle anything else. In any event, this isn't something that a profile would provide. > Allow Spark to be built without assemblies > -- > > Key: SPARK-11157 > URL: https://issues.apache.org/jira/browse/SPARK-11157 > Project: Spark > Issue Type: Umbrella > Components: Build, Spark Core, YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Fix For: 2.0.0 > > Attachments: no-assemblies.pdf > > > For reasoning, discussion of pros and cons, and other more detailed > information, please see attached doc. > The idea is to be able to build a Spark distribution that has just a > directory full of jars instead of the huge assembly files we currently have. > Getting there requires changes in a bunch of places, I'll try to list the > ones I identified in the document, in the order that I think would be needed > to not break things: > * make streaming backends not be assemblies > Since people may depend on the current assembly artifacts in their > deployments, we can't really remove them; but we can make them be dummy jars > and rely on dependency resolution to download all the jars. > PySpark tests would also need some tweaking here. > * make examples jar not be an assembly > Probably requires tweaks to the {{run-example}} script. The location of the > examples jar would have to change (it won't be able to live in the same place > as the main Spark jars anymore). > * update YARN backend to handle a directory full of jars when launching apps > Currently YARN localizes the Spark assembly (depending on the user > configuration); it needs to be modified so that it can localize all needed > libraries instead of a single jar. > * Modify launcher library to handle the jars directory > This should be trivial > * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory > depending on which profile is enabled. > We should keep the option to build with the assembly on by default, for > backwards compatibility, to give people time to prepare. > Filing this bug as an umbrella; please file sub-tasks if you plan to work on > a specific part of the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies
[ https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230119#comment-15230119 ] Jean-Baptiste Onofré commented on SPARK-11157: -- Something that I have in mind is to provide spark-client: right now, when you submit a job, the submission is pretty long as we have bunch of spark dependencies not really useful from a pure client perspective. If you don't mind, I can propose a Maven profile to assemble a spark-client. Thoughts ? > Allow Spark to be built without assemblies > -- > > Key: SPARK-11157 > URL: https://issues.apache.org/jira/browse/SPARK-11157 > Project: Spark > Issue Type: Umbrella > Components: Build, Spark Core, YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Fix For: 2.0.0 > > Attachments: no-assemblies.pdf > > > For reasoning, discussion of pros and cons, and other more detailed > information, please see attached doc. > The idea is to be able to build a Spark distribution that has just a > directory full of jars instead of the huge assembly files we currently have. > Getting there requires changes in a bunch of places, I'll try to list the > ones I identified in the document, in the order that I think would be needed > to not break things: > * make streaming backends not be assemblies > Since people may depend on the current assembly artifacts in their > deployments, we can't really remove them; but we can make them be dummy jars > and rely on dependency resolution to download all the jars. > PySpark tests would also need some tweaking here. > * make examples jar not be an assembly > Probably requires tweaks to the {{run-example}} script. The location of the > examples jar would have to change (it won't be able to live in the same place > as the main Spark jars anymore). > * update YARN backend to handle a directory full of jars when launching apps > Currently YARN localizes the Spark assembly (depending on the user > configuration); it needs to be modified so that it can localize all needed > libraries instead of a single jar. > * Modify launcher library to handle the jars directory > This should be trivial > * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory > depending on which profile is enabled. > We should keep the option to build with the assembly on by default, for > backwards compatibility, to give people time to prepare. > Filing this bug as an umbrella; please file sub-tasks if you plan to work on > a specific part of the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies
[ https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225322#comment-15225322 ] Josh Rosen commented on SPARK-11157: I just merged the final patch for this, so I'm marking this as fixed. If anyone runs into new bugs related to this, please link them here. > Allow Spark to be built without assemblies > -- > > Key: SPARK-11157 > URL: https://issues.apache.org/jira/browse/SPARK-11157 > Project: Spark > Issue Type: Umbrella > Components: Build, Spark Core, YARN >Reporter: Marcelo Vanzin > Fix For: 2.0.0 > > Attachments: no-assemblies.pdf > > > For reasoning, discussion of pros and cons, and other more detailed > information, please see attached doc. > The idea is to be able to build a Spark distribution that has just a > directory full of jars instead of the huge assembly files we currently have. > Getting there requires changes in a bunch of places, I'll try to list the > ones I identified in the document, in the order that I think would be needed > to not break things: > * make streaming backends not be assemblies > Since people may depend on the current assembly artifacts in their > deployments, we can't really remove them; but we can make them be dummy jars > and rely on dependency resolution to download all the jars. > PySpark tests would also need some tweaking here. > * make examples jar not be an assembly > Probably requires tweaks to the {{run-example}} script. The location of the > examples jar would have to change (it won't be able to live in the same place > as the main Spark jars anymore). > * update YARN backend to handle a directory full of jars when launching apps > Currently YARN localizes the Spark assembly (depending on the user > configuration); it needs to be modified so that it can localize all needed > libraries instead of a single jar. > * Modify launcher library to handle the jars directory > This should be trivial > * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory > depending on which profile is enabled. > We should keep the option to build with the assembly on by default, for > backwards compatibility, to give people time to prepare. > Filing this bug as an umbrella; please file sub-tasks if you plan to work on > a specific part of the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies
[ https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178350#comment-15178350 ] Marcelo Vanzin commented on SPARK-11157: Ah, yes. And also, the multiple jars approach would allow people to easily customize their own Spark distro that way. > Allow Spark to be built without assemblies > -- > > Key: SPARK-11157 > URL: https://issues.apache.org/jira/browse/SPARK-11157 > Project: Spark > Issue Type: Umbrella > Components: Build, Spark Core, YARN >Reporter: Marcelo Vanzin > Attachments: no-assemblies.pdf > > > For reasoning, discussion of pros and cons, and other more detailed > information, please see attached doc. > The idea is to be able to build a Spark distribution that has just a > directory full of jars instead of the huge assembly files we currently have. > Getting there requires changes in a bunch of places, I'll try to list the > ones I identified in the document, in the order that I think would be needed > to not break things: > * make streaming backends not be assemblies > Since people may depend on the current assembly artifacts in their > deployments, we can't really remove them; but we can make them be dummy jars > and rely on dependency resolution to download all the jars. > PySpark tests would also need some tweaking here. > * make examples jar not be an assembly > Probably requires tweaks to the {{run-example}} script. The location of the > examples jar would have to change (it won't be able to live in the same place > as the main Spark jars anymore). > * update YARN backend to handle a directory full of jars when launching apps > Currently YARN localizes the Spark assembly (depending on the user > configuration); it needs to be modified so that it can localize all needed > libraries instead of a single jar. > * Modify launcher library to handle the jars directory > This should be trivial > * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory > depending on which profile is enabled. > We should keep the option to build with the assembly on by default, for > backwards compatibility, to give people time to prepare. > Filing this bug as an umbrella; please file sub-tasks if you plan to work on > a specific part of the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies
[ https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178347#comment-15178347 ] Steve Loughran commented on SPARK-11157: FWIW Hadoop lets you swap out 11.x and go to any later version; it's moved off any of the classes that google removed explicitly to allow this. We just ship with 11.x to avoid breaking other things. Jackson though, that's not swappable > Allow Spark to be built without assemblies > -- > > Key: SPARK-11157 > URL: https://issues.apache.org/jira/browse/SPARK-11157 > Project: Spark > Issue Type: Umbrella > Components: Build, Spark Core, YARN >Reporter: Marcelo Vanzin > Attachments: no-assemblies.pdf > > > For reasoning, discussion of pros and cons, and other more detailed > information, please see attached doc. > The idea is to be able to build a Spark distribution that has just a > directory full of jars instead of the huge assembly files we currently have. > Getting there requires changes in a bunch of places, I'll try to list the > ones I identified in the document, in the order that I think would be needed > to not break things: > * make streaming backends not be assemblies > Since people may depend on the current assembly artifacts in their > deployments, we can't really remove them; but we can make them be dummy jars > and rely on dependency resolution to download all the jars. > PySpark tests would also need some tweaking here. > * make examples jar not be an assembly > Probably requires tweaks to the {{run-example}} script. The location of the > examples jar would have to change (it won't be able to live in the same place > as the main Spark jars anymore). > * update YARN backend to handle a directory full of jars when launching apps > Currently YARN localizes the Spark assembly (depending on the user > configuration); it needs to be modified so that it can localize all needed > libraries instead of a single jar. > * Modify launcher library to handle the jars directory > This should be trivial > * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory > depending on which profile is enabled. > We should keep the option to build with the assembly on by default, for > backwards compatibility, to give people time to prepare. > Filing this bug as an umbrella; please file sub-tasks if you plan to work on > a specific part of the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies
[ https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178267#comment-15178267 ] Marcelo Vanzin commented on SPARK-11157: For the record: there's a potential issue with doing this that is not discussed in the document. Mainly, when we stop generating the big assembly file, we also stop doing relocation on all the dependencies of Spark. This means that Hadoop's guava will now leak into Spark's classpath. Spark itself will still relocate its own guava (14) and will use it. It wouldn't work otherwise, since it doesn't work with version 11. But not applications will see guava 11 in their classpath. In the end, I don't think this is too bad, for a couple of reasons: - if you use the "hadoop-provided" package, or distributions that behave similar to it, that's already the case - it's easy for applications that really need it to use a newer guava and shade it, if they're using maven Given the original driver for shading guava was to allow Spark to be embedded into applications that needed a different version (namely Hive), that use case won't be affected. > Allow Spark to be built without assemblies > -- > > Key: SPARK-11157 > URL: https://issues.apache.org/jira/browse/SPARK-11157 > Project: Spark > Issue Type: Umbrella > Components: Build, Spark Core, YARN >Reporter: Marcelo Vanzin > Attachments: no-assemblies.pdf > > > For reasoning, discussion of pros and cons, and other more detailed > information, please see attached doc. > The idea is to be able to build a Spark distribution that has just a > directory full of jars instead of the huge assembly files we currently have. > Getting there requires changes in a bunch of places, I'll try to list the > ones I identified in the document, in the order that I think would be needed > to not break things: > * make streaming backends not be assemblies > Since people may depend on the current assembly artifacts in their > deployments, we can't really remove them; but we can make them be dummy jars > and rely on dependency resolution to download all the jars. > PySpark tests would also need some tweaking here. > * make examples jar not be an assembly > Probably requires tweaks to the {{run-example}} script. The location of the > examples jar would have to change (it won't be able to live in the same place > as the main Spark jars anymore). > * update YARN backend to handle a directory full of jars when launching apps > Currently YARN localizes the Spark assembly (depending on the user > configuration); it needs to be modified so that it can localize all needed > libraries instead of a single jar. > * Modify launcher library to handle the jars directory > This should be trivial > * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory > depending on which profile is enabled. > We should keep the option to build with the assembly on by default, for > backwards compatibility, to give people time to prepare. > Filing this bug as an umbrella; please file sub-tasks if you plan to work on > a specific part of the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies
[ https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176108#comment-15176108 ] Marcelo Vanzin commented on SPARK-11157: That's discussed in the attached document. > Allow Spark to be built without assemblies > -- > > Key: SPARK-11157 > URL: https://issues.apache.org/jira/browse/SPARK-11157 > Project: Spark > Issue Type: Umbrella > Components: Build, Spark Core, YARN >Reporter: Marcelo Vanzin > Attachments: no-assemblies.pdf > > > For reasoning, discussion of pros and cons, and other more detailed > information, please see attached doc. > The idea is to be able to build a Spark distribution that has just a > directory full of jars instead of the huge assembly files we currently have. > Getting there requires changes in a bunch of places, I'll try to list the > ones I identified in the document, in the order that I think would be needed > to not break things: > * make streaming backends not be assemblies > Since people may depend on the current assembly artifacts in their > deployments, we can't really remove them; but we can make them be dummy jars > and rely on dependency resolution to download all the jars. > PySpark tests would also need some tweaking here. > * make examples jar not be an assembly > Probably requires tweaks to the {{run-example}} script. The location of the > examples jar would have to change (it won't be able to live in the same place > as the main Spark jars anymore). > * update YARN backend to handle a directory full of jars when launching apps > Currently YARN localizes the Spark assembly (depending on the user > configuration); it needs to be modified so that it can localize all needed > libraries instead of a single jar. > * Modify launcher library to handle the jars directory > This should be trivial > * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory > depending on which profile is enabled. > We should keep the option to build with the assembly on by default, for > backwards compatibility, to give people time to prepare. > Filing this bug as an umbrella; please file sub-tasks if you plan to work on > a specific part of the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies
[ https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175276#comment-15175276 ] Saisai Shao commented on SPARK-11157: - Hi [~vanzin], for Spark Streaming PySpark related parts, how to address that assembly problem? For example like kafka-assembly, is it a related problem here? > Allow Spark to be built without assemblies > -- > > Key: SPARK-11157 > URL: https://issues.apache.org/jira/browse/SPARK-11157 > Project: Spark > Issue Type: Umbrella > Components: Build, Spark Core, YARN >Reporter: Marcelo Vanzin > Attachments: no-assemblies.pdf > > > For reasoning, discussion of pros and cons, and other more detailed > information, please see attached doc. > The idea is to be able to build a Spark distribution that has just a > directory full of jars instead of the huge assembly files we currently have. > Getting there requires changes in a bunch of places, I'll try to list the > ones I identified in the document, in the order that I think would be needed > to not break things: > * make streaming backends not be assemblies > Since people may depend on the current assembly artifacts in their > deployments, we can't really remove them; but we can make them be dummy jars > and rely on dependency resolution to download all the jars. > PySpark tests would also need some tweaking here. > * make examples jar not be an assembly > Probably requires tweaks to the {{run-example}} script. The location of the > examples jar would have to change (it won't be able to live in the same place > as the main Spark jars anymore). > * update YARN backend to handle a directory full of jars when launching apps > Currently YARN localizes the Spark assembly (depending on the user > configuration); it needs to be modified so that it can localize all needed > libraries instead of a single jar. > * Modify launcher library to handle the jars directory > This should be trivial > * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory > depending on which profile is enabled. > We should keep the option to build with the assembly on by default, for > backwards compatibility, to give people time to prepare. > Filing this bug as an umbrella; please file sub-tasks if you plan to work on > a specific part of the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies
[ https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15125564#comment-15125564 ] Marcelo Vanzin commented on SPARK-11157: bq. Prior to removing the assemblies, it would be great if we could reconfigure our tests to not depend on the full assembly JAR I'm pretty sure that's already the case for all Scala tests (see SPARK-9284). I think pyspark tests still need the streaming assemblies to work. bq. Building up a -classpath argument that lists hundreds of JARs You don't need to do that; you can use a wildcard ("-classpath /path/to/libs/*". The JVM interprets that as "all the jar files under the libs directory". bq. This is going to require changes to Launcher, shell scripts, and a few other places That's already scoped out in the linked document and in the bug summary; it's actually not a lot of work, especially if we don't keep the option to generate assemblies around. > Allow Spark to be built without assemblies > -- > > Key: SPARK-11157 > URL: https://issues.apache.org/jira/browse/SPARK-11157 > Project: Spark > Issue Type: Umbrella > Components: Build, Spark Core, YARN >Reporter: Marcelo Vanzin > Attachments: no-assemblies.pdf > > > For reasoning, discussion of pros and cons, and other more detailed > information, please see attached doc. > The idea is to be able to build a Spark distribution that has just a > directory full of jars instead of the huge assembly files we currently have. > Getting there requires changes in a bunch of places, I'll try to list the > ones I identified in the document, in the order that I think would be needed > to not break things: > * make streaming backends not be assemblies > Since people may depend on the current assembly artifacts in their > deployments, we can't really remove them; but we can make them be dummy jars > and rely on dependency resolution to download all the jars. > PySpark tests would also need some tweaking here. > * make examples jar not be an assembly > Probably requires tweaks to the {{run-example}} script. The location of the > examples jar would have to change (it won't be able to live in the same place > as the main Spark jars anymore). > * update YARN backend to handle a directory full of jars when launching apps > Currently YARN localizes the Spark assembly (depending on the user > configuration); it needs to be modified so that it can localize all needed > libraries instead of a single jar. > * Modify launcher library to handle the jars directory > This should be trivial > * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory > depending on which profile is enabled. > We should keep the option to build with the assembly on by default, for > backwards compatibility, to give people time to prepare. > Filing this bug as an umbrella; please file sub-tasks if you plan to work on > a specific part of the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies
[ https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15125147#comment-15125147 ] Josh Rosen commented on SPARK-11157: I'd love to start making progress towards removing the assemblies. Before we do so, though, I think there are a few subtasks / obstacles that we need to clear first: - First, I think we should just completely remove the assembly rather than giving both assembly and non-assembly options. Every additional option that we provide / support adds lots of maintenance burden and it would be nice to standardize on a single supported distribution technique. - Prior to removing the assemblies, it would be great if we could reconfigure our tests to not depend on the full assembly JAR in order to run. We already have {{SPARK_PREPEND_CLASSPATH}} today, so this might be as simple as making that behavior the default and reconfiguring our test scripts to skip the assembly step. - Building up a {{-classpath}} argument that lists hundreds of JARs is going to be a debugging nightmare (lots of tools truncate process arguments past some limit, etc.), so it would be good to investigate other techniques that we can use to pass the classpath to {{java}} without bloating the CLI (maybe using an environment variable or some file or something?). - This is going to require changes to Launcher, shell scripts, and a few other places; it would be good to scope out these changes to estimate how much work it's going to be. [~vanzin], are there any other obvious subtasks that I'm not thinking of? I'd like to try to see whether we can break down this big task and scope out some smaller pieces so we can make incremental progress and get this finished well in time for 2.0.0 so we have lots of time to test. > Allow Spark to be built without assemblies > -- > > Key: SPARK-11157 > URL: https://issues.apache.org/jira/browse/SPARK-11157 > Project: Spark > Issue Type: Umbrella > Components: Build, Spark Core, YARN >Reporter: Marcelo Vanzin > Attachments: no-assemblies.pdf > > > For reasoning, discussion of pros and cons, and other more detailed > information, please see attached doc. > The idea is to be able to build a Spark distribution that has just a > directory full of jars instead of the huge assembly files we currently have. > Getting there requires changes in a bunch of places, I'll try to list the > ones I identified in the document, in the order that I think would be needed > to not break things: > * make streaming backends not be assemblies > Since people may depend on the current assembly artifacts in their > deployments, we can't really remove them; but we can make them be dummy jars > and rely on dependency resolution to download all the jars. > PySpark tests would also need some tweaking here. > * make examples jar not be an assembly > Probably requires tweaks to the {{run-example}} script. The location of the > examples jar would have to change (it won't be able to live in the same place > as the main Spark jars anymore). > * update YARN backend to handle a directory full of jars when launching apps > Currently YARN localizes the Spark assembly (depending on the user > configuration); it needs to be modified so that it can localize all needed > libraries instead of a single jar. > * Modify launcher library to handle the jars directory > This should be trivial > * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory > depending on which profile is enabled. > We should keep the option to build with the assembly on by default, for > backwards compatibility, to give people time to prepare. > Filing this bug as an umbrella; please file sub-tasks if you plan to work on > a specific part of the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies
[ https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088411#comment-15088411 ] Josh Rosen commented on SPARK-11157: For my own reference / ease-of-searchability, here's a backlink to an earlier discussion on GitHub: https://github.com/vanzin/spark/pull/2 > Allow Spark to be built without assemblies > -- > > Key: SPARK-11157 > URL: https://issues.apache.org/jira/browse/SPARK-11157 > Project: Spark > Issue Type: Umbrella > Components: Build, Spark Core, YARN >Reporter: Marcelo Vanzin > Attachments: no-assemblies.pdf > > > For reasoning, discussion of pros and cons, and other more detailed > information, please see attached doc. > The idea is to be able to build a Spark distribution that has just a > directory full of jars instead of the huge assembly files we currently have. > Getting there requires changes in a bunch of places, I'll try to list the > ones I identified in the document, in the order that I think would be needed > to not break things: > * make streaming backends not be assemblies > Since people may depend on the current assembly artifacts in their > deployments, we can't really remove them; but we can make them be dummy jars > and rely on dependency resolution to download all the jars. > PySpark tests would also need some tweaking here. > * make examples jar not be an assembly > Probably requires tweaks to the {{run-example}} script. The location of the > examples jar would have to change (it won't be able to live in the same place > as the main Spark jars anymore). > * update YARN backend to handle a directory full of jars when launching apps > Currently YARN localizes the Spark assembly (depending on the user > configuration); it needs to be modified so that it can localize all needed > libraries instead of a single jar. > * Modify launcher library to handle the jars directory > This should be trivial > * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory > depending on which profile is enabled. > We should keep the option to build with the assembly on by default, for > backwards compatibility, to give people time to prepare. > Filing this bug as an umbrella; please file sub-tasks if you plan to work on > a specific part of the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies
[ https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088443#comment-15088443 ] Kent Murra commented on SPARK-11157: Having a folder of jars as an option would be great. In our particular situation, we have an internal non-Maven build system in which the Spark build process is a black box. The end result is that we're running into a situation where we're having serious dependency conflicts, mostly around Jackson 2.4 vs Jackson 2.6, and we can't resolve these at the build or deploy steps. We're working on resolving these via manual classpath re-ordering and and use of spark.driver.userClassPathFirst, but being able to swap out the jars would help. > Allow Spark to be built without assemblies > -- > > Key: SPARK-11157 > URL: https://issues.apache.org/jira/browse/SPARK-11157 > Project: Spark > Issue Type: Umbrella > Components: Build, Spark Core, YARN >Reporter: Marcelo Vanzin > Attachments: no-assemblies.pdf > > > For reasoning, discussion of pros and cons, and other more detailed > information, please see attached doc. > The idea is to be able to build a Spark distribution that has just a > directory full of jars instead of the huge assembly files we currently have. > Getting there requires changes in a bunch of places, I'll try to list the > ones I identified in the document, in the order that I think would be needed > to not break things: > * make streaming backends not be assemblies > Since people may depend on the current assembly artifacts in their > deployments, we can't really remove them; but we can make them be dummy jars > and rely on dependency resolution to download all the jars. > PySpark tests would also need some tweaking here. > * make examples jar not be an assembly > Probably requires tweaks to the {{run-example}} script. The location of the > examples jar would have to change (it won't be able to live in the same place > as the main Spark jars anymore). > * update YARN backend to handle a directory full of jars when launching apps > Currently YARN localizes the Spark assembly (depending on the user > configuration); it needs to be modified so that it can localize all needed > libraries instead of a single jar. > * Modify launcher library to handle the jars directory > This should be trivial > * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory > depending on which profile is enabled. > We should keep the option to build with the assembly on by default, for > backwards compatibility, to give people time to prepare. > Filing this bug as an umbrella; please file sub-tasks if you plan to work on > a specific part of the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies
[ https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962897#comment-14962897 ] Jean-Baptiste Onofré commented on SPARK-11157: -- Agree with Marcelo. It's something that I planned: create more fine grained jar file instead of big Spark jar too. > Allow Spark to be built without assemblies > -- > > Key: SPARK-11157 > URL: https://issues.apache.org/jira/browse/SPARK-11157 > Project: Spark > Issue Type: Umbrella > Components: Build, Spark Core, YARN >Reporter: Marcelo Vanzin > Attachments: no-assemblies.pdf > > > For reasoning, discussion of pros and cons, and other more detailed > information, please see attached doc. > The idea is to be able to build a Spark distribution that has just a > directory full of jars instead of the huge assembly files we currently have. > Getting there requires changes in a bunch of places, I'll try to list the > ones I identified in the document, in the order that I think would be needed > to not break things: > * make streaming backends not be assemblies > Since people may depend on the current assembly artifacts in their > deployments, we can't really remove them; but we can make them be dummy jars > and rely on dependency resolution to download all the jars. > PySpark tests would also need some tweaking here. > * make examples jar not be an assembly > Probably requires tweaks to the {{run-example}} script. The location of the > examples jar would have to change (it won't be able to live in the same place > as the main Spark jars anymore). > * update YARN backend to handle a directory full of jars when launching apps > Currently YARN localizes the Spark assembly (depending on the user > configuration); it needs to be modified so that it can localize all needed > libraries instead of a single jar. > * Modify launcher library to handle the jars directory > This should be trivial > * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory > depending on which profile is enabled. > We should keep the option to build with the assembly on by default, for > backwards compatibility, to give people time to prepare. > Filing this bug as an umbrella; please file sub-tasks if you plan to work on > a specific part of the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org