[GitHub] spark pull request: Improved build configuration � �

witgo Tue, 29 Apr 2014 10:53:25 -0700

GitHub user witgo opened a pull request:

    https://github.com/apache/spark/pull/590


    Improved build configuration â¡

    @berngp 
    I merge your code to this PR

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/witgo/spark improved_build

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/590.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #590
    
----
commit 4e96c0153063b35fc03e497f28292a97832e81d4
Author: Bernardo Gomez Palacio <[email protected]>
Date:   2014-04-15T21:03:30Z

    Add YARN/Stable compiled classes to the CLASSPATH.
    
    The change adds the `./yarn/stable/target/<scala-version>/classes` to
    the _Classpath_ when a _dependencies_ assembly is available at the
    assembly directory.
    
    Why is this change necessary?
    Ease the development features and bug-fixes for Spark-YARN.
    
    [ticket: X] : NA
    
    Author      : [email protected]
    Reviewer    : ?
    Testing     : ?

commit 1342886a396be00eda9449c6d84155dfecf954c8
Author: Bernardo Gomez Palacio <[email protected]>
Date:   2014-04-15T21:46:44Z

    The `spark-class` shell now ignores non jar files in the assembly directory.
    
    Why is this change necessary?
    
    While developing in Spark I found myself rebuilding either the
    dependencies assembly or the full spark assembly. I kept running into
    the case of having both the dep-assembly and full-assembly in the same
    directory and getting an error when I called either `spark-shell` or
    `spark-submit`.
    
    Quick fix: move either of them as a .bkp file depending on
    the development work flow you are executing at the moment and enabling
    the `spark-class` to ignore non-jar files. An other option could be to
    move the "offending" jar to a different directory but in my opinion
    keeping them in there is a bit tidier.
    
    e.g.
    
    ```
    ll ./assembly/target/scala-2.10
    spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar
    spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0.jar.bkp
    ```
    
    [ticket: X] : ?

commit ddf2547aa2aea8155f8d6c0386e2cb37bcf61537
Author: Bernardo Gomez Palacio <[email protected]>
Date:   2014-04-15T21:53:23Z

    The `spark-shell` option `--log-conf` also enables the 
SPARK_PRINT_LAUNCH_COMMAND .
    
    Why is this change necessary?
    Most likely when enabling the `--log-conf` through the `spark-shell` you
    are also interested on the full invocation of the java command including the
    _classpath_ and extended options. e.g.
    
    ```
    INFO: Base Directory set to /Users/bernardo/work/github/berngp/spark
    INFO: Spark Master is yarn-client
    INFO: Spark REPL options   -Dspark.logConf=true
    Spark Command: 
/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java -cp 
:/Users/bernardo/work/github/berngp/spark/conf:/Users/bernardo/work/github/berngp/spark/core/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/repl/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/mllib/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/bagel/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/graphx/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/streaming/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/tools/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/catalyst/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/core/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/hive/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/yarn/stable/target/scala-2.10/classes:/Users/bernardo/work/github/berng
 
p/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar:/usr/local/Cellar/hadoop/2.2.0/libexec/etc/hadoop
 -XX:ErrorFile=/tmp/spark-shell-hs_err_pid.log 
-XX:HeapDumpPath=/tmp/spark-shell-java_pid.hprof 
-XX:-HeapDumpOnOutOfMemoryError -XX:-PrintGC -XX:-PrintGCDetails 
-XX:-PrintGCTimeStamps -XX:-PrintTenuringDistribution 
-XX:-PrintAdaptiveSizePolicy -XX:GCLogFileSize=1024K -XX:-UseGCLogFileRotation 
-Xloggc:/tmp/spark-shell-gc.log -XX:+UseConcMarkSweepGC 
-Dspark.cleaner.ttl=10000 -Dspark.driver.host=33.33.33.1 -Dspark.logConf=true 
-Djava.library.path= -Xms400M -Xmx400M org.apache.spark.repl.Main
    ```
    
    [ticket: X] : ?

commit 22045394955992c2c8dfe0e1040c6bb972be6ce4
Author: Bernardo Gomez Palacio <[email protected]>
Date:   2014-04-15T22:15:23Z

    Root is now Spark and qualify the assembly if it was built with YARN.
    
    Why is this change necessary?
    Renamed the SBT "root" project to "spark" to enhance readability.
    
    Currently the assembly is qualified with the Hadoop Version but not if
    YARN has been enabled or not. This change qualifies the assembly such
    that it is easy to identify if YARN was enabled.
    
    e.g
    
    ```
    ./make-distribution.sh --hadoop 2.3.0 --with-yarn
    
    ls -l ./assembly/target/scala-2.10
        spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-yarn.jar
    ```
    
    vs
    
    ```
    ./make-distribution.sh --hadoop 2.3.0
    
    ls -l ./assembly/target/scala-2.10
        spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0.jar
    ```
    
    [ticket: X] : ?

commit 889bf4ed742ed3d06cb62276ef554f2f37b53ee6
Author: Bernardo Gomez Palacio <[email protected]>
Date:   2014-04-16T00:08:27Z

    Upgrade the Maven Build to YARN 2.3.0.
    
    Upgraded to YARN 2.3.0, removed unnecessary `relativePath` values and
    removed incorrect version for the "org.apache.hadoop:hadoop-client"
    dependency at yarn/pom.xml.

commit 460510a4ddf7082b24baeecbff33bfaee6438ea7
Author: witgo <[email protected]>
Date:   2014-04-29T17:15:58Z

    merge https://github.com/berngp/spark/commits/feature/small-shell-changes

commit f1c7535fe6e97e1d5ebf8adcac01d82c794a01f8
Author: witgo <[email protected]>
Date:   2014-04-29T17:48:01Z

    Improved build configuration â¡

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Improved build configuration � �

Reply via email to