That seemed to correct the issue. Thanks for pointing out the lack of diffs between v1.2.0-rc2 and v1.2.0 -- I'm not sure how my git repo ended up not matching its origin.
-matt On Sat, Dec 20, 2014 at 4:25 PM, Matt Mead <[email protected]> wrote: > Bizarre. I originally cloned from and have been pulling from > https://github.com/apache/spark, and my repo shows the following: > > user@host:~/development/spark$ git diff v1.2.0-rc2..v1.2.0 | wc -l >> 1898 > > > If I pull a fresh clone, I get this: > > user@host:~$ git clone https://github.com/apache/spark >> Cloning into 'spark'... >> remote: Counting objects: 152765, done. >> remote: Compressing objects: 100% (50/50), done. >> remote: Total 152765 (delta 16), reused 64 (delta 16) >> Receiving objects: 100% (152765/152765), 85.01 MiB | 3.29 MiB/s, done. >> Resolving deltas: 100% (68247/68247), done. >> user@host:~$ cd spark >> user@host:~/spark$ git diff v1.2.0-rc2..v1.2.0 | wc -l >> 0 > > > I will do a build from the fresh clone and report back on whether the > behavior persists. > > > > > -matt > > > On Sat, Dec 20, 2014 at 4:16 PM, Mark Hamstra <[email protected]> > wrote: > >> This makes no sense. There is no difference between v1.2.0-rc2 and >> v1.2.0: https://github.com/apache/spark/compare/v1.2.0-rc2...v1.2.0 >> >> On Sat, Dec 20, 2014 at 12:44 PM, Matt Mead <[email protected]> >> wrote: >> >>> First, thanks for the efforts and contribution to such a useful software >>> stack! Spark is great! >>> >>> I have been using the git tags for v1.2.0-rc1 and v1.2.0-rc2 built as >>> follows: >>> >>> ./make-distribution.sh -Dhadoop.version=2.5.0-cdh5.2.0 >>>> -Dyarn.version=2.5.0-cdh5.2.0 -Phadoop-2.4 -Phive -Pyarn >>>> -Phive-thriftserver >>> >>> >>> I have been starting the thriftserver as follows: >>> >>> HADOOP_CONF_DIR=/etc/hadoop/conf ./sbin/start-thriftserver.sh --master >>>> yarn --num-executors 16 >>> >>> >>> Under v1.2.0-rc1 and v1.2.0-rc2, this has worked properly, where the >>> thriftserver starts up and I am able to interact with it and execute >>> queries as expected using the JDBC driver. >>> >>> I have updated to git tag v1.2.0, built identically and started the >>> thriftserver identically, but am now running into the following issue on >>> startup: >>> >>> Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: >>>> hdfs://myhdfs/user/user/.sparkStaging/application_1416150945509_0055/datanucleus-api-jdo-3.2.6.jar, >>>> expected: file:/// >>>> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645) >>>> at >>>> org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80) >>>> at >>>> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:519) >>>> at >>>> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737) >>>> at >>>> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514) >>>> at >>>> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398) >>>> at >>>> org.apache.spark.deploy.yarn.ClientDistributedCacheManager.addResource(ClientDistributedCacheManager.scala:67) >>>> at >>>> org.apache.spark.deploy.yarn.ClientBase$$anonfun$prepareLocalResources$5.apply(ClientBase.scala:257) >>>> at >>>> org.apache.spark.deploy.yarn.ClientBase$$anonfun$prepareLocalResources$5.apply(ClientBase.scala:242) >>>> at scala.Option.foreach(Option.scala:236) >>>> at >>>> org.apache.spark.deploy.yarn.ClientBase$class.prepareLocalResources(ClientBase.scala:242) >>>> at >>>> org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:35) >>>> at >>>> org.apache.spark.deploy.yarn.ClientBase$class.createContainerLaunchContext(ClientBase.scala:350) >>>> at >>>> org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:35) >>>> at >>>> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:80) >>>> at >>>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) >>>> at >>>> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:140) >>>> at org.apache.spark.SparkContext.<init>(SparkContext.scala:335) >>>> at >>>> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:38) >>>> at >>>> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:56) >>>> at >>>> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala) >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>>> at >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>> at java.lang.reflect.Method.invoke(Method.java:606) >>>> at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) >>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) >>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>> >>> >>> Looking at SPARK-4757, it appears others were seeing this behavior in >>> earlier releases and it is fixed in v1.2.0, whereas I did not see the >>> behavior in earlier releases and now am seeing it in v1.2.0. >>> >>> I have tested this with the exact same build/launch commands on two >>> separate CDH5.2.0 clusters with identical results. Both machines where the >>> build and execution take place have a proper HDFS/YARN client configuration >>> in /etc/hadoop/conf and other hadoop tools like MR2 on YARN function as >>> expected. >>> >>> Any ideas on what to do to resolve this issue? >>> >>> Thanks! >>> >>> >>> >>> >>> -matt >>> >>> >> >
