Another wild guess, I've had issues trying to use the 's3' protocol from Hadoop and got things working by using the 's3n' protocol instead.
> On Mar 16, 2014, at 8:41 AM, Jay Vyas <[email protected]> wrote: > > I specifically have fixed mapreduce jobs by doing what the error message > suggests. > > But maybe (hopefully) there is another workaround that is configuration > driven. > > Just a hunch but, Maybe mahout needs to be refactored to create fs objects > using the get(uri,conf) calls? > > As hadoop evolves to support different flavored of hcfs probably using API > calls that are more flexible (i.e. Like the fs.get(uri,conf) one), will > probably be a good thing to keep in mind. > >> On Mar 16, 2014, at 9:22 AM, Frank Scholten <[email protected]> wrote: >> >> Hi Konstantin, >> >> Good to hear from you. >> >> The link you mentioned points to EigenSeedGenerator not >> RandomSeedGenerator. The problem seems to be with the call to >> >> fs.getFileStatus(input).isDir() >> >> >> It's been a while and I don't remember but perhaps you have to set >> additional Hadoop fs properties to use S3. See >> https://wiki.apache.org/hadoop/AmazonS3. Perhaps you isolate the cause of >> this by creating a small Java main app with that line of code and run it in >> the debugger. >> >> Cheers, >> >> Frank >> >> >> >> On Sun, Mar 16, 2014 at 12:07 PM, Konstantin Slisenko >> <[email protected]>wrote: >> >>> Hello! >>> >>> I run a text-documents clustering on Hadoop cluster in Amazon Elastic Map >>> Reduce. As input and output I use S3 Amazon file system. I specify all >>> paths as "s3://bucket-name/folder-name". >>> >>> SparceVectorsFromSequenceFile works correctly with S3 >>> but when I start K-Means clustering job, I get this error: >>> >>> Exception in thread "main" java.lang.IllegalArgumentException: This >>> file system object (hdfs://172.31.41.65:9000) does not support access >>> to the request path >>> >>> 's3://by.kslisenko.bigdata/stackovweflow-small/out_new/sparse/tfidf-vectors' >>> You possibly called FileSystem.get(conf) when you should have called >>> FileSystem.get(uri, conf) to obtain a file system supporting your >>> path. >>> >>> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:375) >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:106) >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:162) >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:530) >>> at >>> org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:76) >>> at >>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:93) >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >>> at >>> bbuzz2011.stackoverflow.runner.RunnerWithInParams.cluster(RunnerWithInParams.java:121) >>> at >>> bbuzz2011.stackoverflow.runner.RunnerWithInParams.run(RunnerWithInParams.java:52)cause >>> of this a >>> at >>> bbuzz2011.stackoverflow.runner.RunnerWithInParams.main(RunnerWithInParams.java:41) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> at java.lang.reflect.Method.invoke(Method.java:597) >>> at org.apache.hadoop.util.RunJar.main(RunJar.java:156) >>> >>> >>> I checked RandomSeedGenerator.buildRandom >>> ( >>> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.8/org/apache/mahout/clustering/kmeans/EigenSeedGenerator.java?av=f >>> ) >>> and I assume it has correct code: >>> >>> FileSystem fs = FileSystem.get(output.toUri(), conf); >>> >>> >>> I can not run clustering because of this error. May be you have any >>> ideas how to fix this? >>>
