Hi Jonathan, Thanks! This will be super helpful going forward — one less thing to remember :-)
Best, Asif PS: Would love to see some of the best practices (#of executors, memory etc) from aws slides on the official aws+spark help page http://www.slideshare.net/AmazonWebServices/bdt309-data-science-best-practices-for-apache-spark-on-amazon-emr On Sun, Nov 22, 2015 at 10:20 AM, Jonathan Kelly <jonathaka...@gmail.com> wrote: > Asif, > > This was also brought up by a somebody else at Amazon on Friday, and I > came up with the following EMR Step that can be used to work around the > issue: > > aws emr create-cluster ... --applications Zeppelin-Sandbox --steps > Name=CreateZeppelinLocalRepo,Jar=command-runner.jar,Args=[bash,-c,"sudo -u > zeppelin mkdir /var/lib/zeppelin/local-repo; sudo ln -s > /var/lib/zeppelin/local-repo /usr/lib/zeppelin"] > > Moon's suggestion will also work, but it is a little better to have the > local-dir underneath /var (which, as of emr-4.2.0, is symlinked to > /mnt/var) so that it is on the first ephemeral disk rather than on the root > partition to prevent your master instance's root partition from filling up. > > Thanks for bringing up this issue, and we will be sure to fix it in the > next release of EMR so that you won't need this workaround anymore. > > ~ Jonathan > > On Fri, Nov 20, 2015 at 9:51 PM, Asif Imran <covariantmon...@gmail.com> > wrote: > >> Hi Moon, >> >> Thanks so much. This worked perfectly. >> >> Asif >> >> >> On Fri, Nov 20, 2015 at 6:06 PM, moon soo Lee <m...@apache.org> wrote: >> >>> Hi Asif, >>> >>> Thanks for sharing the problem. >>> I've found that run >>> >>> sudo mkdir /usr/lib/zeppelin/local-repo >>> sudo chown zeppelin /usr/lib/zeppelin/local-repo >>> >>> on your EMR master node, makes %dep working correctly. >>> >>> Hope this helps. >>> >>> Best, >>> moon >>> >>> On Sat, Nov 21, 2015 at 7:34 AM Asif Imran <covariantmon...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> I am running spark on aws emr with default options. On the notebook, I >>>> am having trouble getting this off the ground >>>> >>>> %dep >>>> z.reset() >>>> z.load("com.databricks:spark-csv_2.10:1.2.0") >>>> >>>> Digging through the user-list, ppl in the past had similar issues >>>> ranging from path permissions, proxy or yarn incompatibility. >>>> >>>> Is there a standard way to debug this? More general question is: does >>>> the dep loader even work for this particular set up (namely, on aws emr) >>>> >>>> >>>> Thanks >>>> Asif >>>> >>>> Love Zeppelin btw :-) >>>> >>>> >>>> >>>> >>>> Log >>>> ————————- >>>> >>>> java.lang.NullPointerException at >>>> org.sonatype.aether.impl.internal.DefaultRepositorySystem.resolveDependencies(DefaultRepositorySystem.java:352) >>>> at >>>> org.apache.zeppelin.spark.dep.DependencyContext.fetchArtifactWithDep(DependencyContext.java:141) >>>> at >>>> org.apache.zeppelin.spark.dep.DependencyContext.fetch(DependencyContext.java:98) >>>> at >>>> org.apache.zeppelin.spark.DepInterpreter.interpret(DepInterpreter.java:189) >>>> at >>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57) >>>> at >>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) >>>> at >>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:276) >>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170) at >>>> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118) >>>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) at >>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) >>>> at >>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>> at java.lang.Thread.run(Thread.java:745) >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >> >