Thanks the maven structure is identical to sbt. just sbt file I will have
to replace with pom.xml

I will use your pom.xml to start with it.


Dr Mich Talebzadeh

On 15 March 2016 at 13:12, Chandeep Singh wrote:

> You can build using maven from the command line as well.
> This layout should give you an idea and here are some resources -
> project/
>    pom.xml   -  Defines the project
>    src/
>       main/
>           java/ - Contains all java code that will go in your final artifact.
>                   See maven-compiler-plugin 
> <> for details
>           scala/ - Contains all scala code that will go in your final 
> artifact.
>                    See maven-scala-plugin 
> <> for details
>           resources/ - Contains all static files that should be available on 
> the classpath
>                        in the final artifact.  See maven-resources-plugin 
> <> for details
>           webapp/ - Contains all content for a web application (jsps, css, 
> images, etc.)
>                     See maven-war-plugin 
> <> for details
>      site/ - Contains all apt or xdoc files used to create a project website.
>              See maven-site-plugin 
> <> for details
>      test/
>          java/ - Contains all java code used for testing.
>                  See maven-compiler-plugin 
> <> for details
>          scala/ - Contains all scala code used for testing.
>                   See maven-scala-plugin 
> <> for details
>          resources/ - Contains all static content that should be available on 
> the
>                       classpath during testing.   See maven-resources-plugin 
> <> for details
On Mar 15, 2016, at 12:38 PM, Chandeep Singh wrote:
> Do you have the Eclipse Maven plugin setup?
> Once you have it setup, File -> New -> Other -> MavenProject -> Next /
> Finish. You’ll see a default POM.xml which you can modify / replace.
> <PastedGraphic-1.png>
> Here is some documentation that should help:
> I’m using the same Eclipse build as you on my Mac. I mostly build a shaded
> JAR and SCP it to the cluster.
On Mar 15, 2016, at 12:22 PM, Mich Talebzadeh wrote:
> wrote:
> Great Chandeep. I also have Eclipse Scala IDE below
> scala IDE build of Eclipse SDK
> Build id: 4.3.0-vfinal-2015-12-01T15:55:22Z-Typesafe
> I am no expert on Eclipse so if I create project called ImportCSV where do
> I need to put the pom file or how do I reference it please. My Eclipse runs
> on a Linux host so it cab access all the directories that sbt project
> accesses? I also believe there will not be any need for external jar files
> in builkd path?
> Thanks
Dr Mich Talebzadeh
On 15 March 2016 at 12:15, Chandeep Singh wrote:
>> Btw, just to add to the confusion ;) I use Maven as well since I moved
>> from Java to Scala but everyone I talk to has been recommending SBT for
>> Scala.
>> I use the Eclipse Scala IDE to build.
>> Here is my sample PoM. You can add dependancies based on your requirement.
>> <project xmlns=""; xmlns:xsi="
>> xsi:schemaLocation="
>> <modelVersion>4.0.0</modelVersion>
>> <groupId>spark</groupId>
>> <version>1.0</version>
>> <name>${project.artifactId}</name>
>> <properties>
>> <maven.compiler.source>1.7</maven.compiler.source>
>> <>1.7</>
>> <encoding>UTF-8</encoding>
>> <scala.version>2.10.4</scala.version>
>> <maven-scala-plugin.version>2.15.2</maven-scala-plugin.version>
>> </properties>
>> <repositories>
>> <repository>
>> <id>cloudera-repo-releases</id>
>> <url></url>
>> </repository>
>> </repositories>
>> <dependencies>
>> <dependency>
>> <groupId>org.scala-lang</groupId>
>> <artifactId>scala-library</artifactId>
>> <version>${scala.version}</version>
>> </dependency>
>> <dependency>
>> <groupId>org.apache.spark</groupId>
>> <artifactId>spark-core_2.10</artifactId>
>> <version>1.5.0-cdh5.5.1</version>
>> </dependency>
>> <dependency>
>> <groupId>org.apache.spark</groupId>
>> <artifactId>spark-mllib_2.10</artifactId>
>> <version>1.5.0-cdh5.5.1</version>
>> </dependency>
>> <dependency>
>> <groupId>org.apache.spark</groupId>
>> <artifactId>spark-hive_2.10</artifactId>
>> <version>1.5.0</version>
>> </dependency>
>> </dependencies>
>> <build>
>> <sourceDirectory>src/main/scala</sourceDirectory>
>> <testSourceDirectory>src/test/scala</testSourceDirectory>
>> <plugins>
>> <plugin>
>> <groupId>org.scala-tools</groupId>
>> <artifactId>maven-scala-plugin</artifactId>
>> <version>${maven-scala-plugin.version}</version>
>> <executions>
>> <execution>
>> <goals>
>> <goal>compile</goal>
>> <goal>testCompile</goal>
>> </goals>
>> </execution>
>> </executions>
>> <configuration>
>> <jvmArgs>
>> <jvmArg>-Xms64m</jvmArg>
>> <jvmArg>-Xmx1024m</jvmArg>
>> </jvmArgs>
>> </configuration>
>> </plugin>
>> <plugin>
>> <groupId>org.apache.maven.plugins</groupId>
>> <artifactId>maven-shade-plugin</artifactId>
>> <version>1.6</version>
>> <executions>
>> <execution>
>> <phase>package</phase>
>> <goals>
>> <goal>shade</goal>
>> </goals>
>> <configuration>
>> <filters>
>> <filter>
>> <artifact>*:*</artifact>
>> <excludes>
>> <exclude>META-INF/*.SF</exclude>
>> <exclude>META-INF/*.DSA</exclude>
>> <exclude>META-INF/*.RSA</exclude>
>> </excludes>
>> </filter>
>> </filters>
>> <transformers>
>> <transformer
>> implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
>> <mainClass></mainClass>
>> </transformer>
>> </transformers>
>> </configuration>
>> </execution>
>> </executions>
>> </plugin>
>> </plugins>
>> </build>
>> <artifactId>scala</artifactId>
>> </project>
On Mar 15, 2016, at 12:09 PM, Mich Talebzadeh wrote:
>> wrote:
>> Ok.
>> Sounds like opinion is divided :)
>> I will try to build a scala app with Maven.
>> When I build with SBT I follow this directory structure
>> High level directory the package name like
>> ImportCSV
>> under ImportCSV I have a directory src and the sbt file ImportCSV.sbt
>> in directory src I have main and scala subdirectories. My scala file is in
>> ImportCSV/src/main/scala
>> called ImportCSV.scala
>> I then have a shell script that runs everything under ImportCSV directory
>> cat generic.ksh
>> #!/bin/ksh
>> #--------------------------------------------------------------------------------
>> #
>> # Procedure:    generic.ksh
>> #
>> # Description:  Compiles and run scala app usinbg sbt and spark-submit
>> #
>> # Parameters:   none
>> #
>> #--------------------------------------------------------------------------------
>> # Vers|  Date  | Who | DA | Description
>> #-----+--------+-----+----+-----------------------------------------------------
>> # 1.0 |04/03/15|  MT |    | Initial Version
>> #--------------------------------------------------------------------------------
>> #
>> function F_USAGE
>> {
>>    echo "USAGE: ${1##*/} -A '<Application>'"
>>    echo "USAGE: ${1##*/} -H '<HELP>' -h '<HELP>'"
>>    exit 10
>> }
>> #
>> # Main Section
>> #
>> if [[ "${1}" = "-h" || "${1}" = "-H" ]]; then
>>    F_USAGE $0
>> fi
>> while getopts A: opt
>> do
>>    case $opt in
>>    (*) F_USAGE $0 ;;
>>    esac
>> done
>> [[ -z ${APPLICATION} ]] && print "You must specify an application value "
>> && F_USAGE $0
>> ENVFILE=/home/hduser/dba/bin/environment.ksh
>> if [[ -f $ENVFILE ]]
>> then
>>         . $ENVFILE
>>         . ~/spark_1.5.2_bin-hadoop2.6.kshrc
>> else
>>         echo "Abort: $0 failed. No environment file ( $ENVFILE ) found"
>>         exit 1
>> fi
>> ##FILE_NAME=`basename $0 .ksh`
>> CLASS=`echo ${FILE_NAME}|tr "[:upper:]" "[:lower:]"`
>> NOW="`date +%Y%m%d_%H%M`"
>> [ -f ${LOG_FILE} ] && rm -f ${LOG_FILE}
>> print "\n" `date` ", Started $0" | tee -a ${LOG_FILE}
>> cd ../${FILE_NAME}
>> print "Compiling ${FILE_NAME}" | tee -a ${LOG_FILE}
>> sbt package
>> print "Submiiting the job" | tee -a ${LOG_FILE}
>> ${SPARK_HOME}/bin/spark-submit \
>>                 --packages com.databricks:spark-csv_2.11:1.3.0 \
>>                 --class "${FILE_NAME}" \
>>                 --master spark:// \
>>                 --executor-memory=12G \
>>                 --executor-cores=12 \
>>                 --num-executors=2 \
>>                 target/scala-2.10/${CLASS}_2.10-1.0.jar
>> print `date` ", Finished $0" | tee -a ${LOG_FILE}
>> exit
>> So to run it for ImportCSV all I need is to do
>> ./generic.ksh -A ImportCSV
>> Now can anyone kindly give me a rough guideline on directory and location
>> of pom.xml to make this work using maven?
>> Thanks
Dr Mich Talebzadeh
On 15 March 2016 at 10:50, Sean Owen wrote:
>>> FWIW, I strongly prefer Maven over SBT even for Scala projects. The
>>> Spark build of reference is Maven.
On Tue, Mar 15, 2016 at 10:45 AM, Chandeep Singh wrote:
>>> wrote:
>>> > For Scala, SBT is recommended.
>>> >
On Mar 15, 2016, at 10:42 AM, Mich Talebzadeh wrote:
>>> > wrote:
>>> >
>>> > Hi,
>>> >
>>> > I build my Spark/Scala packages using SBT that works fine. I have
>>> created
>>> > generic shell scripts to build and submit it.
>>> >
>>> > Yesterday I noticed that some use Maven and Pom for this purpose.
>>> >
>>> > Which approach is recommended?
>>> >
>>> > Thanks,
>>> >
>>> >
