Ok.

Sounds like opinion is divided :)

I will try to build a scala app with Maven.

When I build with SBT I follow this directory structure

High level directory the package name like

ImportCSV

under ImportCSV I have a directory src and the sbt file ImportCSV.sbt

in directory src I have main and scala subdirectories. My scala file is in

ImportCSV/src/main/scala

called ImportCSV.scala

I then have a shell script that runs everything under ImportCSV directory

cat generic.ksh
#!/bin/ksh
#--------------------------------------------------------------------------------
#
# Procedure:    generic.ksh
#
# Description:  Compiles and run scala app usinbg sbt and spark-submit
#
# Parameters:   none
#
#--------------------------------------------------------------------------------
# Vers|  Date  | Who | DA | Description
#-----+--------+-----+----+-----------------------------------------------------
# 1.0 |04/03/15|  MT |    | Initial Version
#--------------------------------------------------------------------------------
#
function F_USAGE
{
   echo "USAGE: ${1##*/} -A '<Application>'"
   echo "USAGE: ${1##*/} -H '<HELP>' -h '<HELP>'"
   exit 10
}
#
# Main Section
#
if [[ "${1}" = "-h" || "${1}" = "-H" ]]; then
   F_USAGE $0
fi
## MAP INPUT TO VARIABLES
while getopts A: opt
do
   case $opt in
   (A) APPLICATION="$OPTARG" ;;
   (*) F_USAGE $0 ;;
   esac
done
[[ -z ${APPLICATION} ]] && print "You must specify an application value "
&& F_USAGE $0
ENVFILE=/home/hduser/dba/bin/environment.ksh
if [[ -f $ENVFILE ]]
then
        . $ENVFILE
        . ~/spark_1.5.2_bin-hadoop2.6.kshrc
else
        echo "Abort: $0 failed. No environment file ( $ENVFILE ) found"
        exit 1
fi
##FILE_NAME=`basename $0 .ksh`
FILE_NAME=${APPLICATION}
CLASS=`echo ${FILE_NAME}|tr "[:upper:]" "[:lower:]"`
NOW="`date +%Y%m%d_%H%M`"
LOG_FILE=${LOGDIR}/${FILE_NAME}.log
[ -f ${LOG_FILE} ] && rm -f ${LOG_FILE}
print "\n" `date` ", Started $0" | tee -a ${LOG_FILE}
cd ../${FILE_NAME}
print "Compiling ${FILE_NAME}" | tee -a ${LOG_FILE}
sbt package
print "Submiiting the job" | tee -a ${LOG_FILE}

${SPARK_HOME}/bin/spark-submit \
                --packages com.databricks:spark-csv_2.11:1.3.0 \
                --class "${FILE_NAME}" \
                --master spark://50.140.197.217:7077 \
                --executor-memory=12G \
                --executor-cores=12 \
                --num-executors=2 \
                target/scala-2.10/${CLASS}_2.10-1.0.jar
print `date` ", Finished $0" | tee -a ${LOG_FILE}
exit


So to run it for ImportCSV all I need is to do

./generic.ksh -A ImportCSV

Now can anyone kindly give me a rough guideline on directory and location
of pom.xml to make this work using maven?

Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 15 March 2016 at 10:50, Sean Owen <so...@cloudera.com> wrote:

> FWIW, I strongly prefer Maven over SBT even for Scala projects. The
> Spark build of reference is Maven.
>
> On Tue, Mar 15, 2016 at 10:45 AM, Chandeep Singh <c...@chandeep.com> wrote:
> > For Scala, SBT is recommended.
> >
> > On Mar 15, 2016, at 10:42 AM, Mich Talebzadeh <mich.talebza...@gmail.com
> >
> > wrote:
> >
> > Hi,
> >
> > I build my Spark/Scala packages using SBT that works fine. I have created
> > generic shell scripts to build and submit it.
> >
> > Yesterday I noticed that some use Maven and Pom for this purpose.
> >
> > Which approach is recommended?
> >
> > Thanks,
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn
> >
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> >
> >
>

Reply via email to