Re: Building Spark packages with SBTor Maven

Chandeep Singh Tue, 15 Mar 2016 05:15:57 -0700

Btw, just to add to the confusion ;) I use Maven as well since I moved from 
Java to Scala but everyone I talk to has been recommending SBT for Scala.


I use the Eclipse Scala IDE to build. http://scala-ide.org/ 
<http://scala-ide.org/>

Here is my sample PoM. You can add dependancies based on your requirement.

<project xmlns="http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
        xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd";>
        <modelVersion>4.0.0</modelVersion>
        <groupId>spark</groupId>
        <version>1.0</version>
        <name>${project.artifactId}</name>

        <properties>
                <maven.compiler.source>1.7</maven.compiler.source>
                <maven.compiler.target>1.7</maven.compiler.target>
                <encoding>UTF-8</encoding>
                <scala.version>2.10.4</scala.version>
                <maven-scala-plugin.version>2.15.2</maven-scala-plugin.version>
        </properties>

        <repositories>
                <repository>
                        <id>cloudera-repo-releases</id>
                        
<url>https://repository.cloudera.com/artifactory/repo/</url>
                </repository>
        </repositories>

        <dependencies>
                <dependency>
                        <groupId>org.scala-lang</groupId>
                        <artifactId>scala-library</artifactId>
                        <version>${scala.version}</version>
                </dependency>
                <dependency>
                        <groupId>org.apache.spark</groupId>
                        <artifactId>spark-core_2.10</artifactId>
                        <version>1.5.0-cdh5.5.1</version>
                </dependency>
                <dependency>
                        <groupId>org.apache.spark</groupId>
                        <artifactId>spark-mllib_2.10</artifactId>
                        <version>1.5.0-cdh5.5.1</version>
                </dependency>
                <dependency>
                        <groupId>org.apache.spark</groupId>
                        <artifactId>spark-hive_2.10</artifactId>
                        <version>1.5.0</version>
                </dependency>

        </dependencies>
        <build>
                <sourceDirectory>src/main/scala</sourceDirectory>
                <testSourceDirectory>src/test/scala</testSourceDirectory>
                <plugins>
                        <plugin>
                                <groupId>org.scala-tools</groupId>
                                <artifactId>maven-scala-plugin</artifactId>
                                <version>${maven-scala-plugin.version}</version>
                                <executions>
                                        <execution>
                                                <goals>
                                                        <goal>compile</goal>
                                                        <goal>testCompile</goal>
                                                </goals>
                                        </execution>
                                </executions>
                                <configuration>
                                        <jvmArgs>
                                                <jvmArg>-Xms64m</jvmArg>
                                                <jvmArg>-Xmx1024m</jvmArg>
                                        </jvmArgs>
                                </configuration>
                        </plugin>
                        <plugin>
                                <groupId>org.apache.maven.plugins</groupId>
                                <artifactId>maven-shade-plugin</artifactId>
                                <version>1.6</version>
                                <executions>
                                        <execution>
                                                <phase>package</phase>
                                                <goals>
                                                        <goal>shade</goal>
                                                </goals>
                                                <configuration>
                                                        <filters>
                                                                <filter>
                                                                        
<artifact>*:*</artifact>
                                                                        
<excludes>
                                                                                
<exclude>META-INF/*.SF</exclude>
                                                                                
<exclude>META-INF/*.DSA</exclude>
                                                                                
<exclude>META-INF/*.RSA</exclude>
                                                                        
</excludes>
                                                                </filter>
                                                        </filters>
                                                        <transformers>
                                                                <transformer
                                                                        
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                                                        
<mainClass>com.group.id.Launcher1</mainClass>
                                                                </transformer>
                                                        </transformers>
                                                </configuration>
                                        </execution>
                                </executions>
                        </plugin>
                </plugins>
        </build>

        <artifactId>scala</artifactId>
</project>


> On Mar 15, 2016, at 12:09 PM, Mich Talebzadeh <mich.talebza...@gmail.com> 
> wrote:
> 
> Ok.
> 
> Sounds like opinion is divided :)
> 
> I will try to build a scala app with Maven.
> 
> When I build with SBT I follow this directory structure
> 
> High level directory the package name like
> 
> ImportCSV
> 
> under ImportCSV I have a directory src and the sbt file ImportCSV.sbt
> 
> in directory src I have main and scala subdirectories. My scala file is in
> 
> ImportCSV/src/main/scala
> 
> called ImportCSV.scala
> 
> I then have a shell script that runs everything under ImportCSV directory
> 
> cat generic.ksh
> #!/bin/ksh
> #--------------------------------------------------------------------------------
> #
> # Procedure:    generic.ksh
> #
> # Description:  Compiles and run scala app usinbg sbt and spark-submit
> #
> # Parameters:   none
> #
> #--------------------------------------------------------------------------------
> # Vers|  Date  | Who | DA | Description
> #-----+--------+-----+----+-----------------------------------------------------
> # 1.0 |04/03/15|  MT |    | Initial Version
> #--------------------------------------------------------------------------------
> #
> function F_USAGE
> {
>    echo "USAGE: ${1##*/} -A '<Application>'"
>    echo "USAGE: ${1##*/} -H '<HELP>' -h '<HELP>'"
>    exit 10
> }
> #
> # Main Section
> #
> if [[ "${1}" = "-h" || "${1}" = "-H" ]]; then
>    F_USAGE $0
> fi
> ## MAP INPUT TO VARIABLES
> while getopts A: opt
> do
>    case $opt in
>    (A) APPLICATION="$OPTARG" ;;
>    (*) F_USAGE $0 ;;
>    esac
> done
> [[ -z ${APPLICATION} ]] && print "You must specify an application value " && 
> F_USAGE $0
> ENVFILE=/home/hduser/dba/bin/environment.ksh
> if [[ -f $ENVFILE ]]
> then
>         . $ENVFILE
>         . ~/spark_1.5.2_bin-hadoop2.6.kshrc
> else
>         echo "Abort: $0 failed. No environment file ( $ENVFILE ) found"
>         exit 1
> fi
> ##FILE_NAME=`basename $0 .ksh`
> FILE_NAME=${APPLICATION}
> CLASS=`echo ${FILE_NAME}|tr "[:upper:]" "[:lower:]"`
> NOW="`date +%Y%m%d_%H%M`"
> LOG_FILE=${LOGDIR}/${FILE_NAME}.log
> [ -f ${LOG_FILE} ] && rm -f ${LOG_FILE}
> print "\n" `date` ", Started $0" | tee -a ${LOG_FILE}
> cd ../${FILE_NAME}
> print "Compiling ${FILE_NAME}" | tee -a ${LOG_FILE}
> sbt package
> print "Submiiting the job" | tee -a ${LOG_FILE}
> 
> ${SPARK_HOME}/bin/spark-submit \
>                 --packages com.databricks:spark-csv_2.11:1.3.0 \
>                 --class "${FILE_NAME}" \
>                 --master spark://50.140.197.217:7077 
> <http://50.140.197.217:7077/> \
>                 --executor-memory=12G \
>                 --executor-cores=12 \
>                 --num-executors=2 \
>                 target/scala-2.10/${CLASS}_2.10-1.0.jar
> print `date` ", Finished $0" | tee -a ${LOG_FILE}
> exit
> 
> 
> So to run it for ImportCSV all I need is to do
> 
> ./generic.ksh -A ImportCSV
> 
> Now can anyone kindly give me a rough guideline on directory and location of 
> pom.xml to make this work using maven?
> 
> Thanks
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>  
> 
> On 15 March 2016 at 10:50, Sean Owen <so...@cloudera.com 
> <mailto:so...@cloudera.com>> wrote:
> FWIW, I strongly prefer Maven over SBT even for Scala projects. The
> Spark build of reference is Maven.
> 
> On Tue, Mar 15, 2016 at 10:45 AM, Chandeep Singh <c...@chandeep.com 
> <mailto:c...@chandeep.com>> wrote:
> > For Scala, SBT is recommended.
> >
> > On Mar 15, 2016, at 10:42 AM, Mich Talebzadeh <mich.talebza...@gmail.com 
> > <mailto:mich.talebza...@gmail.com>>
> > wrote:
> >
> > Hi,
> >
> > I build my Spark/Scala packages using SBT that works fine. I have created
> > generic shell scripts to build and submit it.
> >
> > Yesterday I noticed that some use Maven and Pom for this purpose.
> >
> > Which approach is recommended?
> >
> > Thanks,
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn
> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >  
> > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
> >
> >
> >
> > http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
> >
> >
> >
> >
>

Re: Building Spark packages with SBTor Maven

Reply via email to