Dell - Internal Use - Confidential Sorry, I should have persevere before posting my question (and wasting people time)
Nathan, You are correct Spark and Hadoop standalone should run from cygwin64 shell to write to the local file system (Windows). Duh! I was running it from GitBash console (??). Those RDD are pretty cool From: Nathan Kronenfeld [mailto:[email protected]] Sent: Wednesday, December 11, 2013 12:32 PM To: [email protected] Subject: Re: IOException - Cannot run program "cygpath": .... Hi, Patrick. If no one else has any ideas.... We just got things to work here, including writing data to the file system. We ended up running the standard unix scripts under cygwin, instead of the cmd scripts with which it currently ships - though the unix scripts required a couple small changes. If you like, I can mail you (or just post) the changes here if that helps you, or you can wait a day or two while we put a pull request together with the necessary changes in it. -Nathan On Wed, Dec 11, 2013 at 2:22 PM, Nathan Kronenfeld <[email protected]<mailto:[email protected]>> wrote: In another question I just asked (unhelpfully labeled "Spark Forum Question"), we are trying the same thing, with similar luck. We came across the same error when we ran without Cygwin installed or cygwin/bin in the path - something in hadoop here is expecting a windows-like path, and trying to make it unix-like. Unfortunately, someone there is suggesting when you do run with cygwin in the path, it just fails elsewhere - PairRDDFunctions.writeToFile calls into something in hadoop that expects the path in unix-like format. On Wed, Dec 11, 2013 at 2:01 PM, <[email protected]<mailto:[email protected]>> wrote: Dell - Internal Use - Confidential Hi all, I got the following error message (see below) when Spark (standalone, local) write into the local Windows file system. Other Java application can write and read from the local disk without problem. Has anyone heard or read about a similar issue? This may not be related to Spark deployment... ----------------------- java.io<http://java.io.IO>.IOException (java.io.IOException: Cannot run program "cygpath": CreateProcess error=2, The system cannot find the file specified) java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) org.apache.hadoop.util.Shell.runCommand(Shell.java:206) org.apache.hadoop.util.Shell.run(Shell.java:188) org.apache.hadoop.fs.FileUtil$CygPathCommand.<init>(FileUtil.java:412) org.apache.hadoop.fs.FileUtil.makeShellPath(FileUtil.java:438) org.apache.hadoop.fs.FileUtil.makeShellPath(FileUtil.java:465) org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:592) org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:584) org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:427) org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:465) org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:433) org.apache.hadoop.fs.FileSystem.create(FileSystem.java:886) org.apache.hadoop.fs.FileSystem.create(FileSystem.java:781) org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123) org.apache.hadoop.mapred.SparkHadoopWriter.open(SparkHadoopWriter.scala:86) org.apache.spark.rdd.PairRDDFunctions.writeToFile$1(PairRDDFunctions.scala:667) org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$2.apply(PairRDDFunctions.scala:680) Environment --------------- OS: Window 7 enterprise 64-bit JDK 1.7.0_45 64 bit Spark 0.8.0 -- Nathan Kronenfeld Senior Visualization Developer Oculus Info Inc 2 Berkeley Street, Suite 600, Toronto, Ontario M5A 4J5 Phone: +1-416-203-3003 x 238<tel:%2B1-416-203-3003%20x%20238> Email: [email protected]<mailto:[email protected]> -- Nathan Kronenfeld Senior Visualization Developer Oculus Info Inc 2 Berkeley Street, Suite 600, Toronto, Ontario M5A 4J5 Phone: +1-416-203-3003 x 238 Email: [email protected]<mailto:[email protected]>
