Zoltan you should have these in your existing CDH 5.3, that's the best place to get them. Find where spark is running from and should should have them
My versions are here https://gist.github.com/deenar/08fc4ac0da3bdaff10fb Deenar On 29 October 2015 at 15:29, Zoltan Fedor <zoltan.0.fe...@gmail.com> wrote: > i don't have spark-defaults.conf and spark-env.sh, so if you have a > working Spark 1.5.1 with Hive metastore access on CDH 5.3 then could you > please send over the settings you are having in your spark-defaults.conf > and spark-env.sh? > Thanks > > On Thu, Oct 29, 2015 at 11:14 AM, Deenar Toraskar < > deenar.toras...@gmail.com> wrote: > >> Here is what I did, maybe that will help you. >> >> 1) Downloaded spark-1.5.1 (With HAdoop 2.6.0) spark-1.5.1-bin-hadoop2.6 >> and extracted it on the edge node, set SPARK_HOME to this location >> 2) Copied the existing configuration (spark-defaults.conf and >> spark-env.sh) from your spark install >> (/opt/cloudera/parcels/CDH/lib/spark/conf/yarn-conf on our environment) to >> $SPARK_HOME/conf >> 3) updated spark.yarn.jar in spark-defaults.conf >> 4) copied over all the configuration files from >> /opt/cloudera/parcels/CDH/lib/spark/conf/yarn-conf to >> $SPARK_HOME/conf/yarn-conf >> >> and it worked. You may be better off with a custom build for CDH 5.3.3 >> hadoop, which you already have done. >> >> Deenar >> >> On 29 October 2015 at 14:35, Zoltan Fedor <zoltan.0.fe...@gmail.com> >> wrote: >> >>> Sure, I did it with spark-shell, which seems to be showing the same >>> error - not using the hive-site.xml >>> >>> >>> $ HADOOP_CONF_DIR=$SPARK_HOME/hadoop-conf >>> YARN_CONF_DIR=$SPARK_HOME/yarn-conf HADOOP_USER_NAME=biapp MASTER=yarn >>> $SPARK_HOME/bin/pyspark --deploy-mode client --driver-class-path >>> $HIVE_CLASSPATH >>> Python 2.6.6 (r266:84292, Jul 23 2015, 05:13:40) >>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2 >>> Type "help", "copyright", "credits" or "license" for more information. >>> SLF4J: Class path contains multiple SLF4J bindings. >>> SLF4J: Found binding in >>> [jar:file:/usr/lib/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.5.0-cdh5.3.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>> SLF4J: Found binding in >>> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an >>> explanation. >>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] >>> 15/10/29 10:33:20 WARN MetricsSystem: Using default name DAGScheduler >>> for source because spark.app.id is not set. >>> 15/10/29 10:33:22 WARN NativeCodeLoader: Unable to load native-hadoop >>> library for your platform... using builtin-java classes where applicable >>> 15/10/29 10:33:50 WARN HiveConf: HiveConf of name hive.metastore.local >>> does not exist >>> Welcome to >>> ____ __ >>> / __/__ ___ _____/ /__ >>> _\ \/ _ \/ _ `/ __/ '_/ >>> /__ / .__/\_,_/_/ /_/\_\ version 1.5.1 >>> /_/ >>> >>> Using Python version 2.6.6 (r266:84292, Jul 23 2015 05:13:40) >>> SparkContext available as sc, HiveContext available as sqlContext. >>> >>> >>> biapps@biapps-qa01:~> HADOOP_CONF_DIR=$SPARK_HOME/hadoop-conf >>> YARN_CONF_DIR=$SPARK_HOME/yarn-conf HADOOP_USER_NAME=biapp MASTER=yarn >>> $SPARK_HOME/bin/spark-shell --deploy-mode client >>> SLF4J: Class path contains multiple SLF4J bindings. >>> SLF4J: Found binding in >>> [jar:file:/usr/lib/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.5.0-cdh5.3.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>> SLF4J: Found binding in >>> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an >>> explanation. >>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] >>> Welcome to >>> ____ __ >>> / __/__ ___ _____/ /__ >>> _\ \/ _ \/ _ `/ __/ '_/ >>> /___/ .__/\_,_/_/ /_/\_\ version 1.5.1 >>> /_/ >>> >>> Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.7.0_91) >>> Type in expressions to have them evaluated. >>> Type :help for more information. >>> 15/10/29 10:34:15 WARN MetricsSystem: Using default name DAGScheduler >>> for source because spark.app.id is not set. >>> 15/10/29 10:34:16 WARN NativeCodeLoader: Unable to load native-hadoop >>> library for your platform... using builtin-java classes where applicable >>> Spark context available as sc. >>> 15/10/29 10:34:46 WARN HiveConf: HiveConf of name hive.metastore.local >>> does not exist >>> 15/10/29 10:34:46 WARN ShellBasedUnixGroupsMapping: got exception trying >>> to get groups for user biapp: id: biapp: No such user >>> >>> 15/10/29 10:34:46 WARN UserGroupInformation: No groups available for >>> user biapp >>> java.lang.RuntimeException: >>> org.apache.hadoop.security.AccessControlException: Permission denied: >>> user=biapp, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x >>> at >>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) >>> at >>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) >>> at >>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) >>> at >>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6287) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6269) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6221) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4088) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4058) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4031) >>> at >>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:788) >>> at >>> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297) >>> at >>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594) >>> at >>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) >>> at >>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) >>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) >>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) >>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:422) >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) >>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) >>> >>> at >>> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) >>> at >>> org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:171) >>> at >>> org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:162) >>> at >>> org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:160) >>> at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:167) >>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) >>> at >>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) >>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526) >>> at >>> org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1028) >>> at $iwC$$iwC.<init>(<console>:9) >>> at $iwC.<init>(<console>:18) >>> at <init>(<console>:20) >>> at .<init>(<console>:24) >>> at .<clinit>(<console>) >>> at .<init>(<console>:7) >>> at .<clinit>(<console>) >>> at $print(<console>) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:606) >>> at >>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) >>> at >>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340) >>> at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) >>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) >>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) >>> at >>> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) >>> at >>> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) >>> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) >>> at >>> org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:132) >>> at >>> org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:124) >>> at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324) >>> at >>> org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:124) >>> at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64) >>> at >>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974) >>> at >>> org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:159) >>> at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64) >>> at >>> org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:108) >>> at >>> org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64) >>> at >>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:991) >>> at >>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) >>> at >>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) >>> at >>> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) >>> at org.apache.spark.repl.SparkILoop.org >>> $apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) >>> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) >>> at org.apache.spark.repl.Main$.main(Main.scala:31) >>> at org.apache.spark.repl.Main.main(Main.scala) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:606) >>> at >>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) >>> at >>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) >>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) >>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) >>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>> Caused by: org.apache.hadoop.security.AccessControlException: Permission >>> denied: user=biapp, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x >>> at >>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) >>> at >>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) >>> at >>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) >>> at >>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6287) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6269) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6221) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4088) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4058) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4031) >>> at >>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:788) >>> at >>> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297) >>> at >>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594) >>> at >>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) >>> at >>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) >>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) >>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) >>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:422) >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) >>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) >>> >>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) >>> at >>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) >>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526) >>> at >>> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) >>> at >>> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) >>> at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2689) >>> at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2658) >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:831) >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:827) >>> at >>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:827) >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:820) >>> at >>> org.apache.hadoop.hive.ql.exec.Utilities.createDirsWithPermission(Utilities.java:3679) >>> at >>> org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:597) >>> at >>> org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554) >>> at >>> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508) >>> ... 56 more >>> Caused by: >>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): >>> Permission denied: user=biapp, access=WRITE, >>> inode="/user":hdfs:supergroup:drwxr-xr-x >>> at >>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) >>> at >>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) >>> at >>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) >>> at >>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6287) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6269) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6221) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4088) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4058) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4031) >>> at >>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:788) >>> at >>> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297) >>> at >>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594) >>> at >>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) >>> at >>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) >>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) >>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) >>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:422) >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) >>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) >>> >>> at org.apache.hadoop.ipc.Client.call(Client.java:1411) >>> at org.apache.hadoop.ipc.Client.call(Client.java:1364) >>> at >>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) >>> at com.sun.proxy.$Proxy14.mkdirs(Unknown Source) >>> at >>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:531) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:606) >>> at >>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) >>> at >>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) >>> at com.sun.proxy.$Proxy15.mkdirs(Unknown Source) >>> at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2687) >>> ... 66 more >>> >>> <console>:10: error: not found: value sqlContext >>> import sqlContext.implicits._ >>> ^ >>> <console>:10: error: not found: value sqlContext >>> import sqlContext.sql >>> ^ >>> >>> scala> sqlContext.sql("show databases").collect >>> <console>:14: error: not found: value sqlContext >>> sqlContext.sql("show databases").collect >>> ^ >>> >>> scala> >>> >>> On Thu, Oct 29, 2015 at 10:26 AM, Deenar Toraskar < >>> deenar.toras...@gmail.com> wrote: >>> >>>> I dont know a lot about how pyspark works. Can you possibly try running >>>> spark-shell and do the same? >>>> >>>> sqlContext.sql("show databases").collect >>>> >>>> Deenar >>>> >>>> On 29 October 2015 at 14:18, Zoltan Fedor <zoltan.0.fe...@gmail.com> >>>> wrote: >>>> >>>>> Yes, I am. It was compiled with the following: >>>>> >>>>> export SPARK_HADOOP_VERSION=2.5.0-cdh5.3.3 >>>>> export SPARK_YARN=true >>>>> export SPARK_HIVE=true >>>>> export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M >>>>> -XX:ReservedCodeCacheSize=512m" >>>>> mvn -Pyarn -Phadoop-2.5 -Dhadoop.version=2.5.0-cdh5.3.3 -Phive >>>>> -Phive-thriftserver -DskipTests clean package >>>>> >>>>> On Thu, Oct 29, 2015 at 10:16 AM, Deenar Toraskar < >>>>> deenar.toras...@gmail.com> wrote: >>>>> >>>>>> Are you using Spark built with hive ? >>>>>> >>>>>> # Apache Hadoop 2.4.X with Hive 13 support >>>>>> mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive >>>>>> -Phive-thriftserver -DskipTests clean package >>>>>> >>>>>> >>>>>> On 29 October 2015 at 13:08, Zoltan Fedor <zoltan.0.fe...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Deenar, >>>>>>> As suggested, I have moved the hive-site.xml from HADOOP_CONF_DIR >>>>>>> ($SPARK_HOME/hadoop-conf) to YARN_CONF_DIR ($SPARK_HOME/conf/yarn-conf) >>>>>>> and >>>>>>> use the below to start pyspark, but the error is the exact same as >>>>>>> before. >>>>>>> >>>>>>> $ HADOOP_CONF_DIR=$SPARK_HOME/hadoop-conf >>>>>>> YARN_CONF_DIR=$SPARK_HOME/conf/yarn-conf HADOOP_USER_NAME=biapp >>>>>>> MASTER=yarn >>>>>>> $SPARK_HOME/bin/pyspark --deploy-mode client >>>>>>> >>>>>>> Python 2.6.6 (r266:84292, Jul 23 2015, 05:13:40) >>>>>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2 >>>>>>> Type "help", "copyright", "credits" or "license" for more >>>>>>> information. >>>>>>> SLF4J: Class path contains multiple SLF4J bindings. >>>>>>> SLF4J: Found binding in >>>>>>> [jar:file:/usr/lib/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.5.0-cdh5.3.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>>>>> SLF4J: Found binding in >>>>>>> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an >>>>>>> explanation. >>>>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] >>>>>>> 15/10/29 09:06:36 WARN MetricsSystem: Using default name >>>>>>> DAGScheduler for source because spark.app.id is not set. >>>>>>> 15/10/29 09:06:38 WARN NativeCodeLoader: Unable to load >>>>>>> native-hadoop library for your platform... using builtin-java classes >>>>>>> where >>>>>>> applicable >>>>>>> 15/10/29 09:07:03 WARN HiveConf: HiveConf of name >>>>>>> hive.metastore.local does not exist >>>>>>> Welcome to >>>>>>> ____ __ >>>>>>> / __/__ ___ _____/ /__ >>>>>>> _\ \/ _ \/ _ `/ __/ '_/ >>>>>>> /__ / .__/\_,_/_/ /_/\_\ version 1.5.1 >>>>>>> /_/ >>>>>>> >>>>>>> Using Python version 2.6.6 (r266:84292, Jul 23 2015 05:13:40) >>>>>>> SparkContext available as sc, HiveContext available as sqlContext. >>>>>>> >>> sqlContext2 = HiveContext(sc) >>>>>>> >>> sqlContext2 = HiveContext(sc) >>>>>>> >>> sqlContext2.sql("show databases").first() >>>>>>> 15/10/29 09:07:34 WARN HiveConf: HiveConf of name >>>>>>> hive.metastore.local does not exist >>>>>>> 15/10/29 09:07:35 WARN ShellBasedUnixGroupsMapping: got exception >>>>>>> trying to get groups for user biapp: id: biapp: No such user >>>>>>> >>>>>>> 15/10/29 09:07:35 WARN UserGroupInformation: No groups available for >>>>>>> user biapp >>>>>>> Traceback (most recent call last): >>>>>>> File "<stdin>", line 1, in <module> >>>>>>> File >>>>>>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py", >>>>>>> line 552, in sql >>>>>>> return DataFrame(self._ssql_ctx.sql(sqlQuery), self) >>>>>>> File >>>>>>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py", >>>>>>> line 660, in _ssql_ctx >>>>>>> "build/sbt assembly", e) >>>>>>> Exception: ("You must build Spark with Hive. Export >>>>>>> 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError(u'An error >>>>>>> occurred while calling None.org.apache.spark.sql.hive.HiveContext.\n', >>>>>>> JavaObject id=o20)) >>>>>>> >>> >>>>>>> >>>>>>> >>>>>>> On Thu, Oct 29, 2015 at 7:20 AM, Deenar Toraskar < >>>>>>> deenar.toras...@gmail.com> wrote: >>>>>>> >>>>>>>> *Hi Zoltan* >>>>>>>> >>>>>>>> Add hive-site.xml to your YARN_CONF_DIR. i.e. >>>>>>>> $SPARK_HOME/conf/yarn-conf >>>>>>>> >>>>>>>> Deenar >>>>>>>> >>>>>>>> *Think Reactive Ltd* >>>>>>>> deenar.toras...@thinkreactive.co.uk >>>>>>>> 07714140812 >>>>>>>> >>>>>>>> On 28 October 2015 at 14:28, Zoltan Fedor <zoltan.0.fe...@gmail.com >>>>>>>> > wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> We have a shared CDH 5.3.3 cluster and trying to use Spark 1.5.1 >>>>>>>>> on it in yarn client mode with Hive. >>>>>>>>> >>>>>>>>> I have compiled Spark 1.5.1 with SPARK_HIVE=true, but it seems I >>>>>>>>> am not able to make SparkSQL to pick up the hive-site.xml when runnig >>>>>>>>> pyspark. >>>>>>>>> >>>>>>>>> hive-site.xml is located in $SPARK_HOME/hadoop-conf/hive-site.xml >>>>>>>>> and also in $SPARK_HOME/conf/hive-site.xml >>>>>>>>> >>>>>>>>> When I start pyspark with the below command and then run some >>>>>>>>> simple SparkSQL it fails, it seems it didn't pic up the settings in >>>>>>>>> hive-site.xml >>>>>>>>> >>>>>>>>> $ HADOOP_CONF_DIR=$SPARK_HOME/hadoop-conf >>>>>>>>> YARN_CONF_DIR=$SPARK_HOME/yarn-conf HADOOP_USER_NAME=biapp MASTER=yarn >>>>>>>>> $SPARK_HOME/bin/pyspark --deploy-mode client >>>>>>>>> >>>>>>>>> Python 2.6.6 (r266:84292, Jul 23 2015, 05:13:40) >>>>>>>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2 >>>>>>>>> Type "help", "copyright", "credits" or "license" for more >>>>>>>>> information. >>>>>>>>> SLF4J: Class path contains multiple SLF4J bindings. >>>>>>>>> SLF4J: Found binding in >>>>>>>>> [jar:file:/usr/lib/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.5.0-cdh5.3.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>>>>>>> SLF4J: Found binding in >>>>>>>>> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for >>>>>>>>> an explanation. >>>>>>>>> SLF4J: Actual binding is of type >>>>>>>>> [org.slf4j.impl.Log4jLoggerFactory] >>>>>>>>> 15/10/28 10:22:33 WARN MetricsSystem: Using default name >>>>>>>>> DAGScheduler for source because spark.app.id is not set. >>>>>>>>> 15/10/28 10:22:35 WARN NativeCodeLoader: Unable to load >>>>>>>>> native-hadoop library for your platform... using builtin-java classes >>>>>>>>> where >>>>>>>>> applicable >>>>>>>>> 15/10/28 10:22:59 WARN HiveConf: HiveConf of name >>>>>>>>> hive.metastore.local does not exist >>>>>>>>> Welcome to >>>>>>>>> ____ __ >>>>>>>>> / __/__ ___ _____/ /__ >>>>>>>>> _\ \/ _ \/ _ `/ __/ '_/ >>>>>>>>> /__ / .__/\_,_/_/ /_/\_\ version 1.5.1 >>>>>>>>> /_/ >>>>>>>>> >>>>>>>>> Using Python version 2.6.6 (r266:84292, Jul 23 2015 05:13:40) >>>>>>>>> SparkContext available as sc, HiveContext available as sqlContext. >>>>>>>>> >>> sqlContext2 = HiveContext(sc) >>>>>>>>> >>> sqlContext2.sql("show databases").first() >>>>>>>>> 15/10/28 10:23:12 WARN HiveConf: HiveConf of name >>>>>>>>> hive.metastore.local does not exist >>>>>>>>> 15/10/28 10:23:13 WARN ShellBasedUnixGroupsMapping: got exception >>>>>>>>> trying to get groups for user biapp: id: biapp: No such user >>>>>>>>> >>>>>>>>> 15/10/28 10:23:13 WARN UserGroupInformation: No groups available >>>>>>>>> for user biapp >>>>>>>>> Traceback (most recent call last): >>>>>>>>> File "<stdin>", line 1, in <module> >>>>>>>>> File >>>>>>>>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py", >>>>>>>>> line 552, in sql >>>>>>>>> return DataFrame(self._ssql_ctx.sql(sqlQuery), self) >>>>>>>>> File >>>>>>>>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py", >>>>>>>>> line 660, in _ssql_ctx >>>>>>>>> "build/sbt assembly", e) >>>>>>>>> Exception: ("You must build Spark with Hive. Export >>>>>>>>> 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError(u'An >>>>>>>>> error >>>>>>>>> occurred while calling None.org.apache.spark.sql.hive.HiveContext.\n', >>>>>>>>> JavaObject id=o20)) >>>>>>>>> >>> >>>>>>>>> >>>>>>>>> >>>>>>>>> See in the above the warning about "WARN HiveConf: HiveConf of >>>>>>>>> name hive.metastore.local does not exist" while actually there is a >>>>>>>>> hive.metastore.local attribute in the hive-site.xml >>>>>>>>> >>>>>>>>> Any idea how to submit hive-site.xml in yarn client mode? >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >