Dell - Internal Use - Confidential
Sorry, I should have persevere before posting my question (and wasting people 
time)


Nathan,
You are correct
Spark and Hadoop standalone should run from cygwin64 shell to write to the 
local file system (Windows).  Duh! I was running it from GitBash console (??).
Those RDD are pretty cool

From: Nathan Kronenfeld [mailto:[email protected]]
Sent: Wednesday, December 11, 2013 12:32 PM
To: [email protected]
Subject: Re: IOException - Cannot run program "cygpath": ....

Hi, Patrick.

If no one else has any ideas....

We just got things to work here, including writing data to the file system.

We ended up running the standard unix scripts under cygwin, instead of the cmd 
scripts with which it currently ships - though the unix scripts required a 
couple small changes.

If you like, I can mail you (or just post) the changes here if that helps you, 
or you can wait a day or two while we put a pull request together with the 
necessary changes in it.

              -Nathan


On Wed, Dec 11, 2013 at 2:22 PM, Nathan Kronenfeld 
<[email protected]<mailto:[email protected]>> wrote:
In another question I just asked (unhelpfully labeled "Spark Forum Question"), 
we are trying the same thing, with similar luck.

We came across the same error when we ran without Cygwin installed or 
cygwin/bin in the path - something in hadoop here is expecting a windows-like 
path, and trying to make it unix-like.

Unfortunately, someone there is suggesting when you do run with cygwin in the 
path, it just fails elsewhere - PairRDDFunctions.writeToFile calls into 
something in hadoop that expects the path in unix-like format.




On Wed, Dec 11, 2013 at 2:01 PM, 
<[email protected]<mailto:[email protected]>> wrote:

Dell - Internal Use - Confidential
Hi all,

I got the following error message (see below) when Spark (standalone, local) 
write into the local Windows file system. Other Java application can write and 
read from the local disk without problem.
Has anyone heard or read about a similar issue?
This may not be related to Spark deployment...

-----------------------
java.io<http://java.io.IO>.IOException (java.io.IOException: Cannot run program 
"cygpath": CreateProcess error=2, The system cannot find the file specified)

java.lang.ProcessBuilder.start(ProcessBuilder.java:1041)
org.apache.hadoop.util.Shell.runCommand(Shell.java:206)
org.apache.hadoop.util.Shell.run(Shell.java:188)
org.apache.hadoop.fs.FileUtil$CygPathCommand.<init>(FileUtil.java:412)
org.apache.hadoop.fs.FileUtil.makeShellPath(FileUtil.java:438)
org.apache.hadoop.fs.FileUtil.makeShellPath(FileUtil.java:465)
org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:592)
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:584)
org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:427)
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:465)
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:433)
org.apache.hadoop.fs.FileSystem.create(FileSystem.java:886)
org.apache.hadoop.fs.FileSystem.create(FileSystem.java:781)
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
org.apache.hadoop.mapred.SparkHadoopWriter.open(SparkHadoopWriter.scala:86)
org.apache.spark.rdd.PairRDDFunctions.writeToFile$1(PairRDDFunctions.scala:667)
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$2.apply(PairRDDFunctions.scala:680)



Environment
---------------
OS: Window 7 enterprise 64-bit
JDK 1.7.0_45  64 bit
Spark 0.8.0




--
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,
Toronto, Ontario M5A 4J5
Phone:  +1-416-203-3003 x 238<tel:%2B1-416-203-3003%20x%20238>
Email:  [email protected]<mailto:[email protected]>



--
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,
Toronto, Ontario M5A 4J5
Phone:  +1-416-203-3003 x 238
Email:  [email protected]<mailto:[email protected]>

Reply via email to