Hi,
I am not having much luck making Hive run on Spark! I tried to build spark 1.5.2 without Hive jards. It worked but could not run hive sql on Spark. I saw in this link: http://stackoverflow.com/questions/33233431/hive-on-spark-java-lang-noclassdeffounderror-org-apache-hive-spark-client-job stating that “This issue was solved by moving to spark 1.3.0 version and rebuilding it without hive. – Arvindkumar <http://stackoverflow.com/users/647955/arvindkumar> <http://stackoverflow.com/questions/33233431/hive-on-spark-java-lang-noclassdeffounderror-org-apache-hive-spark-client-job#comment54530821_33233431> Oct 27 “ So I downloaded spark 1.3 source and tried to build it myself Using the following command hduser@rhes564::/usr/lib/spark-1.3.0 <mailto:hduser@rhes564::/usr/lib/spark-1.3.0> > build/mvn -X -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package It comes back OK I believe [DEBUG] Scalastyle:check no violations found [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM ........................... SUCCESS [ 3.518 s] [INFO] Spark Project Networking ........................... SUCCESS [ 9.662 s] [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 5.272 s] [INFO] Spark Project Core ................................. SUCCESS [02:47 min] [INFO] Spark Project Bagel ................................ SUCCESS [ 6.522 s] [INFO] Spark Project GraphX ............................... SUCCESS [ 18.118 s] [INFO] Spark Project Streaming ............................ SUCCESS [ 31.471 s] [INFO] Spark Project Catalyst ............................. SUCCESS [ 36.314 s] [INFO] Spark Project SQL .................................. SUCCESS [ 44.442 s] [INFO] Spark Project ML Library ........................... SUCCESS [ 53.826 s] [INFO] Spark Project Tools ................................ SUCCESS [ 2.879 s] [INFO] Spark Project Hive ................................. SUCCESS [ 34.870 s] [INFO] Spark Project REPL ................................. SUCCESS [ 10.789 s] [INFO] Spark Project YARN ................................. SUCCESS [ 11.262 s] [INFO] Spark Project Assembly ............................. SUCCESS [01:44 min] [INFO] Spark Project External Twitter ..................... SUCCESS [ 6.754 s] [INFO] Spark Project External Flume Sink .................. SUCCESS [ 5.013 s] [INFO] Spark Project External Flume ....................... SUCCESS [ 8.276 s] [INFO] Spark Project External MQTT ........................ SUCCESS [ 6.630 s] [INFO] Spark Project External ZeroMQ ...................... SUCCESS [ 6.293 s] [INFO] Spark Project External Kafka ....................... SUCCESS [ 10.764 s] [INFO] Spark Project Examples ............................. SUCCESS [01:58 min] [INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 6.819 s] [INFO] Spark Project External Kafka Assembly .............. SUCCESS [ 35.834 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 12:27 min [INFO] Finished at: 2015-11-26T15:46:05+00:00 [INFO] Final Memory: 82M/691M [INFO] ------------------------------------------------------------------------ [WARNING] The requested profile "hadoop-2.6" could not be activated because it does not exist. Now when I try to build a tar file ./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.6,parquet-provided" First I get this ***NOTE***: JAVA_HOME is not set to a JDK 6 installation. The resulting distribution may not work well with PySpark and will not run with Java 6 (See SPARK-1703 and SPARK-1911). This test can be disabled by adding --skip-java-test. Output from 'java -version' was: java version "1.7.0_25" Java(TM) SE Runtime Environment (build 1.7.0_25-b15) Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode) Would you like to continue anyways? [y,n]: Then I get the following error INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM ........................... SUCCESS [ 3.534 s] [INFO] Spark Project Networking ........................... SUCCESS [ 9.733 s] [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 4.987 s] [INFO] Spark Project Core ................................. SUCCESS [02:43 min] [INFO] Spark Project Bagel ................................ SUCCESS [ 5.717 s] [INFO] Spark Project GraphX ............................... SUCCESS [ 17.316 s] [INFO] Spark Project Streaming ............................ SUCCESS [ 32.133 s] [INFO] Spark Project Catalyst ............................. SUCCESS [ 36.060 s] [INFO] Spark Project SQL .................................. SUCCESS [ 41.609 s] [INFO] Spark Project ML Library ........................... SUCCESS [ 53.484 s] [INFO] Spark Project Tools ................................ SUCCESS [ 2.323 s] [INFO] Spark Project Hive ................................. SUCCESS [ 33.704 s] [INFO] Spark Project REPL ................................. SUCCESS [ 9.625 s] [INFO] Spark Project YARN ................................. FAILURE [ 0.035 s] [INFO] Spark Project Assembly ............................. SKIPPED [INFO] Spark Project External Twitter ..................... SKIPPED [INFO] Spark Project External Flume Sink .................. SKIPPED [INFO] Spark Project External Flume ....................... SKIPPED [INFO] Spark Project External MQTT ........................ SKIPPED [INFO] Spark Project External ZeroMQ ...................... SKIPPED [INFO] Spark Project External Kafka ....................... SKIPPED [INFO] Spark Project Examples ............................. SKIPPED [INFO] Spark Project YARN Shuffle Service ................. SKIPPED [INFO] Spark Project External Kafka Assembly .............. SKIPPED [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 06:54 min [INFO] Finished at: 2015-11-26T16:29:02+00:00 [INFO] Final Memory: 52M/475M [INFO] ------------------------------------------------------------------------ [WARNING] The requested profile "hadoop-2.6" could not be activated because it does not exist. [ERROR] Failed to execute goal on project spark-yarn_2.10: Could not resolve dependencies for project org.apache.spark:spark-yarn_2.10:jar:1.3.0: The following artifacts could not be resolved: org.apache.hadoop:hadoop-yarn-api:jar:1.0.4, org.apache.hadoop:hadoop-yarn-common:jar:1.0.4, org.apache.hadoop:hadoop-yarn-server-web-proxy:jar:1.0.4, org.apache.hadoop:hadoop-yarn-client:jar:1.0.4, org.apache.hadoop:hadoop-yarn-server-tests:jar:tests:1.0.4: Failure to find org.apache.hadoop:hadoop-yarn-api:jar:1.0.4 in https://repo1.maven.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of central has elapsed or updates are forced -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <goals> -rf :spark-yarn_2.10 Mich NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.
