[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies

2016-04-13 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239927#comment-15239927
 ] 

Steve Loughran commented on SPARK-11157:


thanks. Although it's not directly a YARN bug, where things can help there is 
often improving reporting of launch failures (YARN-522 ).



> Allow Spark to be built without assemblies
> --
>
> Key: SPARK-11157
> URL: https://issues.apache.org/jira/browse/SPARK-11157
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Spark Core, YARN
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Fix For: 2.0.0
>
> Attachments: no-assemblies.pdf
>
>
> For reasoning, discussion of pros and cons, and other more detailed 
> information, please see attached doc.
> The idea is to be able to build a Spark distribution that has just a 
> directory full of jars instead of the huge assembly files we currently have.
> Getting there requires changes in a bunch of places, I'll try to list the 
> ones I identified in the document, in the order that I think would be needed 
> to not break things:
> * make streaming backends not be assemblies
> Since people may depend on the current assembly artifacts in their 
> deployments, we can't really remove them; but we can make them be dummy jars 
> and rely on dependency resolution to download all the jars.
> PySpark tests would also need some tweaking here.
> * make examples jar not be an assembly
> Probably requires tweaks to the {{run-example}} script. The location of the 
> examples jar would have to change (it won't be able to live in the same place 
> as the main Spark jars anymore).
> * update YARN backend to handle a directory full of jars when launching apps
> Currently YARN localizes the Spark assembly (depending on the user 
> configuration); it needs to be modified so that it can localize all needed 
> libraries instead of a single jar.
> * Modify launcher library to handle the jars directory
> This should be trivial
> * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory 
> depending on which profile is enabled.
> We should keep the option to build with the assembly on by default, for 
> backwards compatibility, to give people time to prepare.
> Filing this bug as an umbrella; please file sub-tasks if you plan to work on 
> a specific part of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies

2016-04-13 Thread Sebastian Kochman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239708#comment-15239708
 ] 

Sebastian Kochman commented on SPARK-11157:
---

I have filed a Spark bug:
https://issues.apache.org/jira/browse/SPARK-14602
I haven't filed a YARN bug, per your suggestion Steve -- I'm not convinced it's 
a bug in YARN itself. Let's discuss further in the JIRA above.

> Allow Spark to be built without assemblies
> --
>
> Key: SPARK-11157
> URL: https://issues.apache.org/jira/browse/SPARK-11157
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Spark Core, YARN
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Fix For: 2.0.0
>
> Attachments: no-assemblies.pdf
>
>
> For reasoning, discussion of pros and cons, and other more detailed 
> information, please see attached doc.
> The idea is to be able to build a Spark distribution that has just a 
> directory full of jars instead of the huge assembly files we currently have.
> Getting there requires changes in a bunch of places, I'll try to list the 
> ones I identified in the document, in the order that I think would be needed 
> to not break things:
> * make streaming backends not be assemblies
> Since people may depend on the current assembly artifacts in their 
> deployments, we can't really remove them; but we can make them be dummy jars 
> and rely on dependency resolution to download all the jars.
> PySpark tests would also need some tweaking here.
> * make examples jar not be an assembly
> Probably requires tweaks to the {{run-example}} script. The location of the 
> examples jar would have to change (it won't be able to live in the same place 
> as the main Spark jars anymore).
> * update YARN backend to handle a directory full of jars when launching apps
> Currently YARN localizes the Spark assembly (depending on the user 
> configuration); it needs to be modified so that it can localize all needed 
> libraries instead of a single jar.
> * Modify launcher library to handle the jars directory
> This should be trivial
> * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory 
> depending on which profile is enabled.
> We should keep the option to build with the assembly on by default, for 
> backwards compatibility, to give people time to prepare.
> Filing this bug as an umbrella; please file sub-tasks if you plan to work on 
> a specific part of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies

2016-04-13 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239042#comment-15239042
 ] 

Steve Loughran commented on SPARK-11157:


file a spark bug and if it needs escalation to YARN, one can be hooked in there.

What's clearly surfacing is the limit in the line length being exceeeded in the 
{{SPARK_YARN_CACHE_FILES}} env var; this is set up at launch in 
{{ClientDistributedCacheManager}}, picked up in {{ExecutorRunnable}}

I haven't seen this in YARN before, but hit the problem in other apps (hello, 
ant {{}} task). The general strategy is: save the data to a file, use the 
environment variable to point to a file, rather than set it on the CLI

It'll be slightly complex with the YARN launch: any such file will have to be 
localized.

If a simple java.io.Property file is used it's trivial to work with, and sets 
the process up for all env vars related to the cache files and 
{{SPARK_YARN_CACHE_ARCHIVES}}. 



> Allow Spark to be built without assemblies
> --
>
> Key: SPARK-11157
> URL: https://issues.apache.org/jira/browse/SPARK-11157
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Spark Core, YARN
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Fix For: 2.0.0
>
> Attachments: no-assemblies.pdf
>
>
> For reasoning, discussion of pros and cons, and other more detailed 
> information, please see attached doc.
> The idea is to be able to build a Spark distribution that has just a 
> directory full of jars instead of the huge assembly files we currently have.
> Getting there requires changes in a bunch of places, I'll try to list the 
> ones I identified in the document, in the order that I think would be needed 
> to not break things:
> * make streaming backends not be assemblies
> Since people may depend on the current assembly artifacts in their 
> deployments, we can't really remove them; but we can make them be dummy jars 
> and rely on dependency resolution to download all the jars.
> PySpark tests would also need some tweaking here.
> * make examples jar not be an assembly
> Probably requires tweaks to the {{run-example}} script. The location of the 
> examples jar would have to change (it won't be able to live in the same place 
> as the main Spark jars anymore).
> * update YARN backend to handle a directory full of jars when launching apps
> Currently YARN localizes the Spark assembly (depending on the user 
> configuration); it needs to be modified so that it can localize all needed 
> libraries instead of a single jar.
> * Modify launcher library to handle the jars directory
> This should be trivial
> * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory 
> depending on which profile is enabled.
> We should keep the option to build with the assembly on by default, for 
> backwards compatibility, to give people time to prepare.
> Filing this bug as an umbrella; please file sub-tasks if you plan to work on 
> a specific part of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies

2016-04-12 Thread Sebastian Kochman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238151#comment-15238151
 ] 

Sebastian Kochman commented on SPARK-11157:
---

After this change, when I try to submit a Spark app to YARN on Windows (using 
spark-submit.cmd), the app fails with the following error:

Diagnostics: The command line has a length of 12046 exceeds maximum allowed 
length of 8191. Command starts with: @set 
SPARK_YARN_CACHE_FILES=[...]/.sparkStaging/application_1460496865345_0
Failing this attempt. Failing the application.

So basically a large number of jars needed for staging in YARN causes exceeding 
Windows command line length limit.

Has anybody seen this? Is there a recommendation for a workaround?

Marcelo: in the original description, you mentioned there will still be a Maven 
profile building a single assembly. I couldn't find it -- is there one?

> Allow Spark to be built without assemblies
> --
>
> Key: SPARK-11157
> URL: https://issues.apache.org/jira/browse/SPARK-11157
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Spark Core, YARN
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Fix For: 2.0.0
>
> Attachments: no-assemblies.pdf
>
>
> For reasoning, discussion of pros and cons, and other more detailed 
> information, please see attached doc.
> The idea is to be able to build a Spark distribution that has just a 
> directory full of jars instead of the huge assembly files we currently have.
> Getting there requires changes in a bunch of places, I'll try to list the 
> ones I identified in the document, in the order that I think would be needed 
> to not break things:
> * make streaming backends not be assemblies
> Since people may depend on the current assembly artifacts in their 
> deployments, we can't really remove them; but we can make them be dummy jars 
> and rely on dependency resolution to download all the jars.
> PySpark tests would also need some tweaking here.
> * make examples jar not be an assembly
> Probably requires tweaks to the {{run-example}} script. The location of the 
> examples jar would have to change (it won't be able to live in the same place 
> as the main Spark jars anymore).
> * update YARN backend to handle a directory full of jars when launching apps
> Currently YARN localizes the Spark assembly (depending on the user 
> configuration); it needs to be modified so that it can localize all needed 
> libraries instead of a single jar.
> * Modify launcher library to handle the jars directory
> This should be trivial
> * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory 
> depending on which profile is enabled.
> We should keep the option to build with the assembly on by default, for 
> backwards compatibility, to give people time to prepare.
> Filing this bug as an umbrella; please file sub-tasks if you plan to work on 
> a specific part of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies

2016-04-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230157#comment-15230157
 ] 

Sean Owen commented on SPARK-11157:
---

You just need the launcher module, right? you don't need to bundle anything 
else. In any event, this isn't something that a profile would provide.

> Allow Spark to be built without assemblies
> --
>
> Key: SPARK-11157
> URL: https://issues.apache.org/jira/browse/SPARK-11157
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Spark Core, YARN
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Fix For: 2.0.0
>
> Attachments: no-assemblies.pdf
>
>
> For reasoning, discussion of pros and cons, and other more detailed 
> information, please see attached doc.
> The idea is to be able to build a Spark distribution that has just a 
> directory full of jars instead of the huge assembly files we currently have.
> Getting there requires changes in a bunch of places, I'll try to list the 
> ones I identified in the document, in the order that I think would be needed 
> to not break things:
> * make streaming backends not be assemblies
> Since people may depend on the current assembly artifacts in their 
> deployments, we can't really remove them; but we can make them be dummy jars 
> and rely on dependency resolution to download all the jars.
> PySpark tests would also need some tweaking here.
> * make examples jar not be an assembly
> Probably requires tweaks to the {{run-example}} script. The location of the 
> examples jar would have to change (it won't be able to live in the same place 
> as the main Spark jars anymore).
> * update YARN backend to handle a directory full of jars when launching apps
> Currently YARN localizes the Spark assembly (depending on the user 
> configuration); it needs to be modified so that it can localize all needed 
> libraries instead of a single jar.
> * Modify launcher library to handle the jars directory
> This should be trivial
> * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory 
> depending on which profile is enabled.
> We should keep the option to build with the assembly on by default, for 
> backwards compatibility, to give people time to prepare.
> Filing this bug as an umbrella; please file sub-tasks if you plan to work on 
> a specific part of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies

2016-04-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230119#comment-15230119
 ] 

Jean-Baptiste Onofré commented on SPARK-11157:
--

Something that I have in mind is to provide spark-client: right now, when you 
submit a job, the submission is pretty long as we have bunch of spark 
dependencies not really useful from a pure client perspective. If you don't 
mind, I can propose a Maven profile to assemble a spark-client. Thoughts ?

> Allow Spark to be built without assemblies
> --
>
> Key: SPARK-11157
> URL: https://issues.apache.org/jira/browse/SPARK-11157
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Spark Core, YARN
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Fix For: 2.0.0
>
> Attachments: no-assemblies.pdf
>
>
> For reasoning, discussion of pros and cons, and other more detailed 
> information, please see attached doc.
> The idea is to be able to build a Spark distribution that has just a 
> directory full of jars instead of the huge assembly files we currently have.
> Getting there requires changes in a bunch of places, I'll try to list the 
> ones I identified in the document, in the order that I think would be needed 
> to not break things:
> * make streaming backends not be assemblies
> Since people may depend on the current assembly artifacts in their 
> deployments, we can't really remove them; but we can make them be dummy jars 
> and rely on dependency resolution to download all the jars.
> PySpark tests would also need some tweaking here.
> * make examples jar not be an assembly
> Probably requires tweaks to the {{run-example}} script. The location of the 
> examples jar would have to change (it won't be able to live in the same place 
> as the main Spark jars anymore).
> * update YARN backend to handle a directory full of jars when launching apps
> Currently YARN localizes the Spark assembly (depending on the user 
> configuration); it needs to be modified so that it can localize all needed 
> libraries instead of a single jar.
> * Modify launcher library to handle the jars directory
> This should be trivial
> * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory 
> depending on which profile is enabled.
> We should keep the option to build with the assembly on by default, for 
> backwards compatibility, to give people time to prepare.
> Filing this bug as an umbrella; please file sub-tasks if you plan to work on 
> a specific part of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies

2016-04-04 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225322#comment-15225322
 ] 

Josh Rosen commented on SPARK-11157:


I just merged the final patch for this, so I'm marking this as fixed. If anyone 
runs into new bugs related to this, please link them here.

> Allow Spark to be built without assemblies
> --
>
> Key: SPARK-11157
> URL: https://issues.apache.org/jira/browse/SPARK-11157
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Spark Core, YARN
>Reporter: Marcelo Vanzin
> Fix For: 2.0.0
>
> Attachments: no-assemblies.pdf
>
>
> For reasoning, discussion of pros and cons, and other more detailed 
> information, please see attached doc.
> The idea is to be able to build a Spark distribution that has just a 
> directory full of jars instead of the huge assembly files we currently have.
> Getting there requires changes in a bunch of places, I'll try to list the 
> ones I identified in the document, in the order that I think would be needed 
> to not break things:
> * make streaming backends not be assemblies
> Since people may depend on the current assembly artifacts in their 
> deployments, we can't really remove them; but we can make them be dummy jars 
> and rely on dependency resolution to download all the jars.
> PySpark tests would also need some tweaking here.
> * make examples jar not be an assembly
> Probably requires tweaks to the {{run-example}} script. The location of the 
> examples jar would have to change (it won't be able to live in the same place 
> as the main Spark jars anymore).
> * update YARN backend to handle a directory full of jars when launching apps
> Currently YARN localizes the Spark assembly (depending on the user 
> configuration); it needs to be modified so that it can localize all needed 
> libraries instead of a single jar.
> * Modify launcher library to handle the jars directory
> This should be trivial
> * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory 
> depending on which profile is enabled.
> We should keep the option to build with the assembly on by default, for 
> backwards compatibility, to give people time to prepare.
> Filing this bug as an umbrella; please file sub-tasks if you plan to work on 
> a specific part of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies

2016-03-03 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178350#comment-15178350
 ] 

Marcelo Vanzin commented on SPARK-11157:


Ah, yes. And also, the multiple jars approach would allow people to easily 
customize their own Spark distro that way.

> Allow Spark to be built without assemblies
> --
>
> Key: SPARK-11157
> URL: https://issues.apache.org/jira/browse/SPARK-11157
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Spark Core, YARN
>Reporter: Marcelo Vanzin
> Attachments: no-assemblies.pdf
>
>
> For reasoning, discussion of pros and cons, and other more detailed 
> information, please see attached doc.
> The idea is to be able to build a Spark distribution that has just a 
> directory full of jars instead of the huge assembly files we currently have.
> Getting there requires changes in a bunch of places, I'll try to list the 
> ones I identified in the document, in the order that I think would be needed 
> to not break things:
> * make streaming backends not be assemblies
> Since people may depend on the current assembly artifacts in their 
> deployments, we can't really remove them; but we can make them be dummy jars 
> and rely on dependency resolution to download all the jars.
> PySpark tests would also need some tweaking here.
> * make examples jar not be an assembly
> Probably requires tweaks to the {{run-example}} script. The location of the 
> examples jar would have to change (it won't be able to live in the same place 
> as the main Spark jars anymore).
> * update YARN backend to handle a directory full of jars when launching apps
> Currently YARN localizes the Spark assembly (depending on the user 
> configuration); it needs to be modified so that it can localize all needed 
> libraries instead of a single jar.
> * Modify launcher library to handle the jars directory
> This should be trivial
> * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory 
> depending on which profile is enabled.
> We should keep the option to build with the assembly on by default, for 
> backwards compatibility, to give people time to prepare.
> Filing this bug as an umbrella; please file sub-tasks if you plan to work on 
> a specific part of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies

2016-03-03 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178347#comment-15178347
 ] 

Steve Loughran commented on SPARK-11157:


FWIW Hadoop lets you swap out 11.x and go to any later version; it's moved off 
any of the classes that google removed explicitly to allow this. We just ship 
with 11.x to avoid breaking other things. Jackson though, that's not swappable

> Allow Spark to be built without assemblies
> --
>
> Key: SPARK-11157
> URL: https://issues.apache.org/jira/browse/SPARK-11157
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Spark Core, YARN
>Reporter: Marcelo Vanzin
> Attachments: no-assemblies.pdf
>
>
> For reasoning, discussion of pros and cons, and other more detailed 
> information, please see attached doc.
> The idea is to be able to build a Spark distribution that has just a 
> directory full of jars instead of the huge assembly files we currently have.
> Getting there requires changes in a bunch of places, I'll try to list the 
> ones I identified in the document, in the order that I think would be needed 
> to not break things:
> * make streaming backends not be assemblies
> Since people may depend on the current assembly artifacts in their 
> deployments, we can't really remove them; but we can make them be dummy jars 
> and rely on dependency resolution to download all the jars.
> PySpark tests would also need some tweaking here.
> * make examples jar not be an assembly
> Probably requires tweaks to the {{run-example}} script. The location of the 
> examples jar would have to change (it won't be able to live in the same place 
> as the main Spark jars anymore).
> * update YARN backend to handle a directory full of jars when launching apps
> Currently YARN localizes the Spark assembly (depending on the user 
> configuration); it needs to be modified so that it can localize all needed 
> libraries instead of a single jar.
> * Modify launcher library to handle the jars directory
> This should be trivial
> * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory 
> depending on which profile is enabled.
> We should keep the option to build with the assembly on by default, for 
> backwards compatibility, to give people time to prepare.
> Filing this bug as an umbrella; please file sub-tasks if you plan to work on 
> a specific part of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies

2016-03-03 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178267#comment-15178267
 ] 

Marcelo Vanzin commented on SPARK-11157:


For the record: there's a potential issue with doing this that is not discussed 
in the document. Mainly, when we stop generating the big assembly file, we also 
stop doing relocation on all the dependencies of Spark. This means that 
Hadoop's guava will now leak into Spark's classpath.

Spark itself will still relocate its own guava (14) and will use it. It 
wouldn't work otherwise, since it doesn't work with version 11. But not 
applications will see guava 11 in their classpath.

In the end, I don't think this is too bad, for a couple of reasons:

- if you use the "hadoop-provided" package, or distributions that behave 
similar to it, that's already the case
- it's easy for applications that really need it to use a newer guava and shade 
it, if they're using maven

Given the original driver for shading guava was to allow Spark to be embedded 
into applications that needed a different version (namely Hive), that use case 
won't be affected.

> Allow Spark to be built without assemblies
> --
>
> Key: SPARK-11157
> URL: https://issues.apache.org/jira/browse/SPARK-11157
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Spark Core, YARN
>Reporter: Marcelo Vanzin
> Attachments: no-assemblies.pdf
>
>
> For reasoning, discussion of pros and cons, and other more detailed 
> information, please see attached doc.
> The idea is to be able to build a Spark distribution that has just a 
> directory full of jars instead of the huge assembly files we currently have.
> Getting there requires changes in a bunch of places, I'll try to list the 
> ones I identified in the document, in the order that I think would be needed 
> to not break things:
> * make streaming backends not be assemblies
> Since people may depend on the current assembly artifacts in their 
> deployments, we can't really remove them; but we can make them be dummy jars 
> and rely on dependency resolution to download all the jars.
> PySpark tests would also need some tweaking here.
> * make examples jar not be an assembly
> Probably requires tweaks to the {{run-example}} script. The location of the 
> examples jar would have to change (it won't be able to live in the same place 
> as the main Spark jars anymore).
> * update YARN backend to handle a directory full of jars when launching apps
> Currently YARN localizes the Spark assembly (depending on the user 
> configuration); it needs to be modified so that it can localize all needed 
> libraries instead of a single jar.
> * Modify launcher library to handle the jars directory
> This should be trivial
> * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory 
> depending on which profile is enabled.
> We should keep the option to build with the assembly on by default, for 
> backwards compatibility, to give people time to prepare.
> Filing this bug as an umbrella; please file sub-tasks if you plan to work on 
> a specific part of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies

2016-03-02 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176108#comment-15176108
 ] 

Marcelo Vanzin commented on SPARK-11157:


That's discussed in the attached document.

> Allow Spark to be built without assemblies
> --
>
> Key: SPARK-11157
> URL: https://issues.apache.org/jira/browse/SPARK-11157
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Spark Core, YARN
>Reporter: Marcelo Vanzin
> Attachments: no-assemblies.pdf
>
>
> For reasoning, discussion of pros and cons, and other more detailed 
> information, please see attached doc.
> The idea is to be able to build a Spark distribution that has just a 
> directory full of jars instead of the huge assembly files we currently have.
> Getting there requires changes in a bunch of places, I'll try to list the 
> ones I identified in the document, in the order that I think would be needed 
> to not break things:
> * make streaming backends not be assemblies
> Since people may depend on the current assembly artifacts in their 
> deployments, we can't really remove them; but we can make them be dummy jars 
> and rely on dependency resolution to download all the jars.
> PySpark tests would also need some tweaking here.
> * make examples jar not be an assembly
> Probably requires tweaks to the {{run-example}} script. The location of the 
> examples jar would have to change (it won't be able to live in the same place 
> as the main Spark jars anymore).
> * update YARN backend to handle a directory full of jars when launching apps
> Currently YARN localizes the Spark assembly (depending on the user 
> configuration); it needs to be modified so that it can localize all needed 
> libraries instead of a single jar.
> * Modify launcher library to handle the jars directory
> This should be trivial
> * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory 
> depending on which profile is enabled.
> We should keep the option to build with the assembly on by default, for 
> backwards compatibility, to give people time to prepare.
> Filing this bug as an umbrella; please file sub-tasks if you plan to work on 
> a specific part of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies

2016-03-02 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175276#comment-15175276
 ] 

Saisai Shao commented on SPARK-11157:
-

Hi [~vanzin], for Spark Streaming PySpark related parts, how to address that 
assembly problem? For example like kafka-assembly, is it a related problem here?

> Allow Spark to be built without assemblies
> --
>
> Key: SPARK-11157
> URL: https://issues.apache.org/jira/browse/SPARK-11157
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Spark Core, YARN
>Reporter: Marcelo Vanzin
> Attachments: no-assemblies.pdf
>
>
> For reasoning, discussion of pros and cons, and other more detailed 
> information, please see attached doc.
> The idea is to be able to build a Spark distribution that has just a 
> directory full of jars instead of the huge assembly files we currently have.
> Getting there requires changes in a bunch of places, I'll try to list the 
> ones I identified in the document, in the order that I think would be needed 
> to not break things:
> * make streaming backends not be assemblies
> Since people may depend on the current assembly artifacts in their 
> deployments, we can't really remove them; but we can make them be dummy jars 
> and rely on dependency resolution to download all the jars.
> PySpark tests would also need some tweaking here.
> * make examples jar not be an assembly
> Probably requires tweaks to the {{run-example}} script. The location of the 
> examples jar would have to change (it won't be able to live in the same place 
> as the main Spark jars anymore).
> * update YARN backend to handle a directory full of jars when launching apps
> Currently YARN localizes the Spark assembly (depending on the user 
> configuration); it needs to be modified so that it can localize all needed 
> libraries instead of a single jar.
> * Modify launcher library to handle the jars directory
> This should be trivial
> * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory 
> depending on which profile is enabled.
> We should keep the option to build with the assembly on by default, for 
> backwards compatibility, to give people time to prepare.
> Filing this bug as an umbrella; please file sub-tasks if you plan to work on 
> a specific part of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies

2016-01-31 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15125564#comment-15125564
 ] 

Marcelo Vanzin commented on SPARK-11157:


bq. Prior to removing the assemblies, it would be great if we could reconfigure 
our tests to not depend on the full assembly JAR

I'm pretty sure that's already the case for all Scala tests (see SPARK-9284). I 
think pyspark tests still need the streaming assemblies to work.

bq. Building up a -classpath argument that lists hundreds of JARs

You don't need to do that; you can use a wildcard ("-classpath 
/path/to/libs/*". The JVM interprets that as "all the jar files under the libs 
directory".

bq. This is going to require changes to Launcher, shell scripts, and a few 
other places

That's already scoped out in the linked document and in the bug summary; it's 
actually not a lot of work, especially if we don't keep the option to generate 
assemblies around.

> Allow Spark to be built without assemblies
> --
>
> Key: SPARK-11157
> URL: https://issues.apache.org/jira/browse/SPARK-11157
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Spark Core, YARN
>Reporter: Marcelo Vanzin
> Attachments: no-assemblies.pdf
>
>
> For reasoning, discussion of pros and cons, and other more detailed 
> information, please see attached doc.
> The idea is to be able to build a Spark distribution that has just a 
> directory full of jars instead of the huge assembly files we currently have.
> Getting there requires changes in a bunch of places, I'll try to list the 
> ones I identified in the document, in the order that I think would be needed 
> to not break things:
> * make streaming backends not be assemblies
> Since people may depend on the current assembly artifacts in their 
> deployments, we can't really remove them; but we can make them be dummy jars 
> and rely on dependency resolution to download all the jars.
> PySpark tests would also need some tweaking here.
> * make examples jar not be an assembly
> Probably requires tweaks to the {{run-example}} script. The location of the 
> examples jar would have to change (it won't be able to live in the same place 
> as the main Spark jars anymore).
> * update YARN backend to handle a directory full of jars when launching apps
> Currently YARN localizes the Spark assembly (depending on the user 
> configuration); it needs to be modified so that it can localize all needed 
> libraries instead of a single jar.
> * Modify launcher library to handle the jars directory
> This should be trivial
> * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory 
> depending on which profile is enabled.
> We should keep the option to build with the assembly on by default, for 
> backwards compatibility, to give people time to prepare.
> Filing this bug as an umbrella; please file sub-tasks if you plan to work on 
> a specific part of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies

2016-01-30 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15125147#comment-15125147
 ] 

Josh Rosen commented on SPARK-11157:


I'd love to start making progress towards removing the assemblies. Before we do 
so, though, I think there are a few subtasks / obstacles that we need to clear 
first:

- First, I think we should just completely remove the assembly rather than 
giving both assembly and non-assembly options. Every additional option that we 
provide / support adds lots of maintenance burden and it would be nice to 
standardize on a single supported distribution technique.
- Prior to removing the assemblies, it would be great if we could reconfigure 
our tests to not depend on the full assembly JAR in order to run. We already 
have {{SPARK_PREPEND_CLASSPATH}} today, so this might be as simple as making 
that behavior the default and reconfiguring our test scripts to skip the 
assembly step.
- Building up a {{-classpath}} argument that lists hundreds of JARs is going to 
be a debugging nightmare (lots of tools truncate process arguments past some 
limit, etc.), so it would be good to investigate other techniques that we can 
use to pass the classpath to {{java}} without bloating the CLI (maybe using an 
environment variable or some file or something?).
- This is going to require changes to Launcher, shell scripts, and a few other 
places; it would be good to scope out these changes to estimate how much work 
it's going to be.

[~vanzin], are there any other obvious subtasks that I'm not thinking of? I'd 
like to try to see whether we can break down this big task and scope out some 
smaller pieces so we can make incremental progress and get this finished well 
in time for 2.0.0 so we have lots of time to test.

> Allow Spark to be built without assemblies
> --
>
> Key: SPARK-11157
> URL: https://issues.apache.org/jira/browse/SPARK-11157
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Spark Core, YARN
>Reporter: Marcelo Vanzin
> Attachments: no-assemblies.pdf
>
>
> For reasoning, discussion of pros and cons, and other more detailed 
> information, please see attached doc.
> The idea is to be able to build a Spark distribution that has just a 
> directory full of jars instead of the huge assembly files we currently have.
> Getting there requires changes in a bunch of places, I'll try to list the 
> ones I identified in the document, in the order that I think would be needed 
> to not break things:
> * make streaming backends not be assemblies
> Since people may depend on the current assembly artifacts in their 
> deployments, we can't really remove them; but we can make them be dummy jars 
> and rely on dependency resolution to download all the jars.
> PySpark tests would also need some tweaking here.
> * make examples jar not be an assembly
> Probably requires tweaks to the {{run-example}} script. The location of the 
> examples jar would have to change (it won't be able to live in the same place 
> as the main Spark jars anymore).
> * update YARN backend to handle a directory full of jars when launching apps
> Currently YARN localizes the Spark assembly (depending on the user 
> configuration); it needs to be modified so that it can localize all needed 
> libraries instead of a single jar.
> * Modify launcher library to handle the jars directory
> This should be trivial
> * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory 
> depending on which profile is enabled.
> We should keep the option to build with the assembly on by default, for 
> backwards compatibility, to give people time to prepare.
> Filing this bug as an umbrella; please file sub-tasks if you plan to work on 
> a specific part of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies

2016-01-07 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088411#comment-15088411
 ] 

Josh Rosen commented on SPARK-11157:


For my own reference / ease-of-searchability, here's a backlink to an earlier 
discussion on GitHub: https://github.com/vanzin/spark/pull/2

> Allow Spark to be built without assemblies
> --
>
> Key: SPARK-11157
> URL: https://issues.apache.org/jira/browse/SPARK-11157
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Spark Core, YARN
>Reporter: Marcelo Vanzin
> Attachments: no-assemblies.pdf
>
>
> For reasoning, discussion of pros and cons, and other more detailed 
> information, please see attached doc.
> The idea is to be able to build a Spark distribution that has just a 
> directory full of jars instead of the huge assembly files we currently have.
> Getting there requires changes in a bunch of places, I'll try to list the 
> ones I identified in the document, in the order that I think would be needed 
> to not break things:
> * make streaming backends not be assemblies
> Since people may depend on the current assembly artifacts in their 
> deployments, we can't really remove them; but we can make them be dummy jars 
> and rely on dependency resolution to download all the jars.
> PySpark tests would also need some tweaking here.
> * make examples jar not be an assembly
> Probably requires tweaks to the {{run-example}} script. The location of the 
> examples jar would have to change (it won't be able to live in the same place 
> as the main Spark jars anymore).
> * update YARN backend to handle a directory full of jars when launching apps
> Currently YARN localizes the Spark assembly (depending on the user 
> configuration); it needs to be modified so that it can localize all needed 
> libraries instead of a single jar.
> * Modify launcher library to handle the jars directory
> This should be trivial
> * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory 
> depending on which profile is enabled.
> We should keep the option to build with the assembly on by default, for 
> backwards compatibility, to give people time to prepare.
> Filing this bug as an umbrella; please file sub-tasks if you plan to work on 
> a specific part of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies

2016-01-07 Thread Kent Murra (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088443#comment-15088443
 ] 

Kent Murra commented on SPARK-11157:


Having a folder of jars as an option would be great.  In our particular 
situation, we have an internal non-Maven build system in which the Spark build 
process is a black box.  The end result is that we're running into a situation 
where we're having serious dependency conflicts, mostly around Jackson 2.4 vs 
Jackson 2.6, and we can't resolve these at the build or deploy steps.  We're 
working on resolving these via manual classpath re-ordering and and use of 
spark.driver.userClassPathFirst, but being able to swap out the jars would help.

> Allow Spark to be built without assemblies
> --
>
> Key: SPARK-11157
> URL: https://issues.apache.org/jira/browse/SPARK-11157
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Spark Core, YARN
>Reporter: Marcelo Vanzin
> Attachments: no-assemblies.pdf
>
>
> For reasoning, discussion of pros and cons, and other more detailed 
> information, please see attached doc.
> The idea is to be able to build a Spark distribution that has just a 
> directory full of jars instead of the huge assembly files we currently have.
> Getting there requires changes in a bunch of places, I'll try to list the 
> ones I identified in the document, in the order that I think would be needed 
> to not break things:
> * make streaming backends not be assemblies
> Since people may depend on the current assembly artifacts in their 
> deployments, we can't really remove them; but we can make them be dummy jars 
> and rely on dependency resolution to download all the jars.
> PySpark tests would also need some tweaking here.
> * make examples jar not be an assembly
> Probably requires tweaks to the {{run-example}} script. The location of the 
> examples jar would have to change (it won't be able to live in the same place 
> as the main Spark jars anymore).
> * update YARN backend to handle a directory full of jars when launching apps
> Currently YARN localizes the Spark assembly (depending on the user 
> configuration); it needs to be modified so that it can localize all needed 
> libraries instead of a single jar.
> * Modify launcher library to handle the jars directory
> This should be trivial
> * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory 
> depending on which profile is enabled.
> We should keep the option to build with the assembly on by default, for 
> backwards compatibility, to give people time to prepare.
> Filing this bug as an umbrella; please file sub-tasks if you plan to work on 
> a specific part of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11157) Allow Spark to be built without assemblies

2015-10-19 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SPARK-11157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962897#comment-14962897
 ] 

Jean-Baptiste Onofré commented on SPARK-11157:
--

Agree with Marcelo. It's something that I planned: create more fine grained jar 
file instead of big Spark jar too.

> Allow Spark to be built without assemblies
> --
>
> Key: SPARK-11157
> URL: https://issues.apache.org/jira/browse/SPARK-11157
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Spark Core, YARN
>Reporter: Marcelo Vanzin
> Attachments: no-assemblies.pdf
>
>
> For reasoning, discussion of pros and cons, and other more detailed 
> information, please see attached doc.
> The idea is to be able to build a Spark distribution that has just a 
> directory full of jars instead of the huge assembly files we currently have.
> Getting there requires changes in a bunch of places, I'll try to list the 
> ones I identified in the document, in the order that I think would be needed 
> to not break things:
> * make streaming backends not be assemblies
> Since people may depend on the current assembly artifacts in their 
> deployments, we can't really remove them; but we can make them be dummy jars 
> and rely on dependency resolution to download all the jars.
> PySpark tests would also need some tweaking here.
> * make examples jar not be an assembly
> Probably requires tweaks to the {{run-example}} script. The location of the 
> examples jar would have to change (it won't be able to live in the same place 
> as the main Spark jars anymore).
> * update YARN backend to handle a directory full of jars when launching apps
> Currently YARN localizes the Spark assembly (depending on the user 
> configuration); it needs to be modified so that it can localize all needed 
> libraries instead of a single jar.
> * Modify launcher library to handle the jars directory
> This should be trivial
> * Modify {{assembly/pom.xml}} to generate assembly or a {{libs}} directory 
> depending on which profile is enabled.
> We should keep the option to build with the assembly on by default, for 
> backwards compatibility, to give people time to prepare.
> Filing this bug as an umbrella; please file sub-tasks if you plan to work on 
> a specific part of the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org