Re: Building Spark packages with SBTor Maven

Mich Talebzadeh Tue, 15 Mar 2016 05:23:10 -0700

Great Chandeep. I also have Eclipse Scala IDE below

scala IDE build of Eclipse SDK
Build id: 4.3.0-vfinal-2015-12-01T15:55:22Z-Typesafe


I am no expert on Eclipse so if I create project called ImportCSV where do
I need to put the pom file or how do I reference it please. My Eclipse runs
on a Linux host so it cab access all the directories that sbt project
accesses? I also believe there will not be any need for external jar files
in builkd path?

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 15 March 2016 at 12:15, Chandeep Singh <c...@chandeep.com> wrote:

> Btw, just to add to the confusion ;) I use Maven as well since I moved
> from Java to Scala but everyone I talk to has been recommending SBT for
> Scala.
>
> I use the Eclipse Scala IDE to build. http://scala-ide.org/
>
> Here is my sample PoM. You can add dependancies based on your requirement.
>
> <project xmlns="http://maven.apache.org/POM/4.0.0"; xmlns:xsi="
> http://www.w3.org/2001/XMLSchema-instance";
> xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
> http://maven.apache.org/maven-v4_0_0.xsd";>
> <modelVersion>4.0.0</modelVersion>
> <groupId>spark</groupId>
> <version>1.0</version>
> <name>${project.artifactId}</name>
>
> <properties>
> <maven.compiler.source>1.7</maven.compiler.source>
> <maven.compiler.target>1.7</maven.compiler.target>
> <encoding>UTF-8</encoding>
> <scala.version>2.10.4</scala.version>
> <maven-scala-plugin.version>2.15.2</maven-scala-plugin.version>
> </properties>
>
> <repositories>
> <repository>
> <id>cloudera-repo-releases</id>
> <url>https://repository.cloudera.com/artifactory/repo/</url>
> </repository>
> </repositories>
>
> <dependencies>
> <dependency>
> <groupId>org.scala-lang</groupId>
> <artifactId>scala-library</artifactId>
> <version>${scala.version}</version>
> </dependency>
> <dependency>
> <groupId>org.apache.spark</groupId>
> <artifactId>spark-core_2.10</artifactId>
> <version>1.5.0-cdh5.5.1</version>
> </dependency>
> <dependency>
> <groupId>org.apache.spark</groupId>
> <artifactId>spark-mllib_2.10</artifactId>
> <version>1.5.0-cdh5.5.1</version>
> </dependency>
> <dependency>
> <groupId>org.apache.spark</groupId>
> <artifactId>spark-hive_2.10</artifactId>
> <version>1.5.0</version>
> </dependency>
>
> </dependencies>
> <build>
> <sourceDirectory>src/main/scala</sourceDirectory>
> <testSourceDirectory>src/test/scala</testSourceDirectory>
> <plugins>
> <plugin>
> <groupId>org.scala-tools</groupId>
> <artifactId>maven-scala-plugin</artifactId>
> <version>${maven-scala-plugin.version}</version>
> <executions>
> <execution>
> <goals>
> <goal>compile</goal>
> <goal>testCompile</goal>
> </goals>
> </execution>
> </executions>
> <configuration>
> <jvmArgs>
> <jvmArg>-Xms64m</jvmArg>
> <jvmArg>-Xmx1024m</jvmArg>
> </jvmArgs>
> </configuration>
> </plugin>
> <plugin>
> <groupId>org.apache.maven.plugins</groupId>
> <artifactId>maven-shade-plugin</artifactId>
> <version>1.6</version>
> <executions>
> <execution>
> <phase>package</phase>
> <goals>
> <goal>shade</goal>
> </goals>
> <configuration>
> <filters>
> <filter>
> <artifact>*:*</artifact>
> <excludes>
> <exclude>META-INF/*.SF</exclude>
> <exclude>META-INF/*.DSA</exclude>
> <exclude>META-INF/*.RSA</exclude>
> </excludes>
> </filter>
> </filters>
> <transformers>
> <transformer
>
> implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
> <mainClass>com.group.id.Launcher1</mainClass>
> </transformer>
> </transformers>
> </configuration>
> </execution>
> </executions>
> </plugin>
> </plugins>
> </build>
>
> <artifactId>scala</artifactId>
> </project>
>
>
> On Mar 15, 2016, at 12:09 PM, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
> Ok.
>
> Sounds like opinion is divided :)
>
> I will try to build a scala app with Maven.
>
> When I build with SBT I follow this directory structure
>
> High level directory the package name like
>
> ImportCSV
>
> under ImportCSV I have a directory src and the sbt file ImportCSV.sbt
>
> in directory src I have main and scala subdirectories. My scala file is in
>
> ImportCSV/src/main/scala
>
> called ImportCSV.scala
>
> I then have a shell script that runs everything under ImportCSV directory
>
> cat generic.ksh
> #!/bin/ksh
>
> #--------------------------------------------------------------------------------
> #
> # Procedure:    generic.ksh
> #
> # Description:  Compiles and run scala app usinbg sbt and spark-submit
> #
> # Parameters:   none
> #
>
> #--------------------------------------------------------------------------------
> # Vers|  Date  | Who | DA | Description
>
> #-----+--------+-----+----+-----------------------------------------------------
> # 1.0 |04/03/15|  MT |    | Initial Version
>
> #--------------------------------------------------------------------------------
> #
> function F_USAGE
> {
>    echo "USAGE: ${1##*/} -A '<Application>'"
>    echo "USAGE: ${1##*/} -H '<HELP>' -h '<HELP>'"
>    exit 10
> }
> #
> # Main Section
> #
> if [[ "${1}" = "-h" || "${1}" = "-H" ]]; then
>    F_USAGE $0
> fi
> ## MAP INPUT TO VARIABLES
> while getopts A: opt
> do
>    case $opt in
>    (A) APPLICATION="$OPTARG" ;;
>    (*) F_USAGE $0 ;;
>    esac
> done
> [[ -z ${APPLICATION} ]] && print "You must specify an application value "
> && F_USAGE $0
> ENVFILE=/home/hduser/dba/bin/environment.ksh
> if [[ -f $ENVFILE ]]
> then
>         . $ENVFILE
>         . ~/spark_1.5.2_bin-hadoop2.6.kshrc
> else
>         echo "Abort: $0 failed. No environment file ( $ENVFILE ) found"
>         exit 1
> fi
> ##FILE_NAME=`basename $0 .ksh`
> FILE_NAME=${APPLICATION}
> CLASS=`echo ${FILE_NAME}|tr "[:upper:]" "[:lower:]"`
> NOW="`date +%Y%m%d_%H%M`"
> LOG_FILE=${LOGDIR}/${FILE_NAME}.log
> [ -f ${LOG_FILE} ] && rm -f ${LOG_FILE}
> print "\n" `date` ", Started $0" | tee -a ${LOG_FILE}
> cd ../${FILE_NAME}
> print "Compiling ${FILE_NAME}" | tee -a ${LOG_FILE}
> sbt package
> print "Submiiting the job" | tee -a ${LOG_FILE}
>
> ${SPARK_HOME}/bin/spark-submit \
>                 --packages com.databricks:spark-csv_2.11:1.3.0 \
>                 --class "${FILE_NAME}" \
>                 --master spark://50.140.197.217:7077 \
>                 --executor-memory=12G \
>                 --executor-cores=12 \
>                 --num-executors=2 \
>                 target/scala-2.10/${CLASS}_2.10-1.0.jar
> print `date` ", Finished $0" | tee -a ${LOG_FILE}
> exit
>
>
> So to run it for ImportCSV all I need is to do
>
> ./generic.ksh -A ImportCSV
>
> Now can anyone kindly give me a rough guideline on directory and location
> of pom.xml to make this work using maven?
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 15 March 2016 at 10:50, Sean Owen <so...@cloudera.com> wrote:
>
>> FWIW, I strongly prefer Maven over SBT even for Scala projects. The
>> Spark build of reference is Maven.
>>
>> On Tue, Mar 15, 2016 at 10:45 AM, Chandeep Singh <c...@chandeep.com> wrote:
>> > For Scala, SBT is recommended.
>> >
>> > On Mar 15, 2016, at 10:42 AM, Mich Talebzadeh <
>> mich.talebza...@gmail.com>
>> > wrote:
>> >
>> > Hi,
>> >
>> > I build my Spark/Scala packages using SBT that works fine. I have
>> created
>> > generic shell scripts to build and submit it.
>> >
>> > Yesterday I noticed that some use Maven and Pom for this purpose.
>> >
>> > Which approach is recommended?
>> >
>> > Thanks,
>> >
>> >
>> > Dr Mich Talebzadeh
>> >
>> >
>> >
>> > LinkedIn
>> >
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> >
>> >
>> >
>> > http://talebzadehmich.wordpress.com
>> >
>> >
>> >
>> >
>>
>
>
>

Re: Building Spark packages with SBTor Maven

Reply via email to