GitHub user witgo opened a pull request:
https://github.com/apache/spark/pull/590
Improved build configuration â
¡
@berngp
I merge your code to this PR
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/witgo/spark improved_build
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/590.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #590
----
commit 4e96c0153063b35fc03e497f28292a97832e81d4
Author: Bernardo Gomez Palacio <[email protected]>
Date: 2014-04-15T21:03:30Z
Add YARN/Stable compiled classes to the CLASSPATH.
The change adds the `./yarn/stable/target/<scala-version>/classes` to
the _Classpath_ when a _dependencies_ assembly is available at the
assembly directory.
Why is this change necessary?
Ease the development features and bug-fixes for Spark-YARN.
[ticket: X] : NA
Author : [email protected]
Reviewer : ?
Testing : ?
commit 1342886a396be00eda9449c6d84155dfecf954c8
Author: Bernardo Gomez Palacio <[email protected]>
Date: 2014-04-15T21:46:44Z
The `spark-class` shell now ignores non jar files in the assembly directory.
Why is this change necessary?
While developing in Spark I found myself rebuilding either the
dependencies assembly or the full spark assembly. I kept running into
the case of having both the dep-assembly and full-assembly in the same
directory and getting an error when I called either `spark-shell` or
`spark-submit`.
Quick fix: move either of them as a .bkp file depending on
the development work flow you are executing at the moment and enabling
the `spark-class` to ignore non-jar files. An other option could be to
move the "offending" jar to a different directory but in my opinion
keeping them in there is a bit tidier.
e.g.
```
ll ./assembly/target/scala-2.10
spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar
spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0.jar.bkp
```
[ticket: X] : ?
commit ddf2547aa2aea8155f8d6c0386e2cb37bcf61537
Author: Bernardo Gomez Palacio <[email protected]>
Date: 2014-04-15T21:53:23Z
The `spark-shell` option `--log-conf` also enables the
SPARK_PRINT_LAUNCH_COMMAND .
Why is this change necessary?
Most likely when enabling the `--log-conf` through the `spark-shell` you
are also interested on the full invocation of the java command including the
_classpath_ and extended options. e.g.
```
INFO: Base Directory set to /Users/bernardo/work/github/berngp/spark
INFO: Spark Master is yarn-client
INFO: Spark REPL options -Dspark.logConf=true
Spark Command:
/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java -cp
:/Users/bernardo/work/github/berngp/spark/conf:/Users/bernardo/work/github/berngp/spark/core/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/repl/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/mllib/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/bagel/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/graphx/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/streaming/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/tools/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/catalyst/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/core/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/hive/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/yarn/stable/target/scala-2.10/classes:/Users/bernardo/work/github/berng
p/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar:/usr/local/Cellar/hadoop/2.2.0/libexec/etc/hadoop
-XX:ErrorFile=/tmp/spark-shell-hs_err_pid.log
-XX:HeapDumpPath=/tmp/spark-shell-java_pid.hprof
-XX:-HeapDumpOnOutOfMemoryError -XX:-PrintGC -XX:-PrintGCDetails
-XX:-PrintGCTimeStamps -XX:-PrintTenuringDistribution
-XX:-PrintAdaptiveSizePolicy -XX:GCLogFileSize=1024K -XX:-UseGCLogFileRotation
-Xloggc:/tmp/spark-shell-gc.log -XX:+UseConcMarkSweepGC
-Dspark.cleaner.ttl=10000 -Dspark.driver.host=33.33.33.1 -Dspark.logConf=true
-Djava.library.path= -Xms400M -Xmx400M org.apache.spark.repl.Main
```
[ticket: X] : ?
commit 22045394955992c2c8dfe0e1040c6bb972be6ce4
Author: Bernardo Gomez Palacio <[email protected]>
Date: 2014-04-15T22:15:23Z
Root is now Spark and qualify the assembly if it was built with YARN.
Why is this change necessary?
Renamed the SBT "root" project to "spark" to enhance readability.
Currently the assembly is qualified with the Hadoop Version but not if
YARN has been enabled or not. This change qualifies the assembly such
that it is easy to identify if YARN was enabled.
e.g
```
./make-distribution.sh --hadoop 2.3.0 --with-yarn
ls -l ./assembly/target/scala-2.10
spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-yarn.jar
```
vs
```
./make-distribution.sh --hadoop 2.3.0
ls -l ./assembly/target/scala-2.10
spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0.jar
```
[ticket: X] : ?
commit 889bf4ed742ed3d06cb62276ef554f2f37b53ee6
Author: Bernardo Gomez Palacio <[email protected]>
Date: 2014-04-16T00:08:27Z
Upgrade the Maven Build to YARN 2.3.0.
Upgraded to YARN 2.3.0, removed unnecessary `relativePath` values and
removed incorrect version for the "org.apache.hadoop:hadoop-client"
dependency at yarn/pom.xml.
commit 460510a4ddf7082b24baeecbff33bfaee6438ea7
Author: witgo <[email protected]>
Date: 2014-04-29T17:15:58Z
merge https://github.com/berngp/spark/commits/feature/small-shell-changes
commit f1c7535fe6e97e1d5ebf8adcac01d82c794a01f8
Author: witgo <[email protected]>
Date: 2014-04-29T17:48:01Z
Improved build configuration â
¡
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---