I agree best to be explicit when creating filesystem instances by using the two argument get(...). it's time to update it filesystem 2.0 Apis. Can you file a Jira for this ? If not I will :)
> On Mar 16, 2014, at 12:37 PM, Sebastian Schelter <[email protected]> wrote: > > I've also encountered a similar error once. It's really just the > FileSystem.get call that needs to be modified. I think its a good idea to > walk through the codebase and refactor this where necessary. > > --sebastian > > >> On 03/16/2014 05:16 PM, Andrew Musselman wrote: >> Another wild guess, I've had issues trying to use the 's3' protocol from >> Hadoop and got things working by using the 's3n' protocol instead. >> >>> On Mar 16, 2014, at 8:41 AM, Jay Vyas <[email protected]> wrote: >>> >>> I specifically have fixed mapreduce jobs by doing what the error message >>> suggests. >>> >>> But maybe (hopefully) there is another workaround that is configuration >>> driven. >>> >>> Just a hunch but, Maybe mahout needs to be refactored to create fs objects >>> using the get(uri,conf) calls? >>> >>> As hadoop evolves to support different flavored of hcfs probably using API >>> calls that are more flexible (i.e. Like the fs.get(uri,conf) one), will >>> probably be a good thing to keep in mind. >>> >>>> On Mar 16, 2014, at 9:22 AM, Frank Scholten <[email protected]> wrote: >>>> >>>> Hi Konstantin, >>>> >>>> Good to hear from you. >>>> >>>> The link you mentioned points to EigenSeedGenerator not >>>> RandomSeedGenerator. The problem seems to be with the call to >>>> >>>> fs.getFileStatus(input).isDir() >>>> >>>> >>>> It's been a while and I don't remember but perhaps you have to set >>>> additional Hadoop fs properties to use S3. See >>>> https://wiki.apache.org/hadoop/AmazonS3. Perhaps you isolate the cause of >>>> this by creating a small Java main app with that line of code and run it in >>>> the debugger. >>>> >>>> Cheers, >>>> >>>> Frank >>>> >>>> >>>> >>>> On Sun, Mar 16, 2014 at 12:07 PM, Konstantin Slisenko >>>> <[email protected]>wrote: >>>> >>>>> Hello! >>>>> >>>>> I run a text-documents clustering on Hadoop cluster in Amazon Elastic Map >>>>> Reduce. As input and output I use S3 Amazon file system. I specify all >>>>> paths as "s3://bucket-name/folder-name". >>>>> >>>>> SparceVectorsFromSequenceFile works correctly with S3 >>>>> but when I start K-Means clustering job, I get this error: >>>>> >>>>> Exception in thread "main" java.lang.IllegalArgumentException: This >>>>> file system object (hdfs://172.31.41.65:9000) does not support access >>>>> to the request path >>>>> >>>>> 's3://by.kslisenko.bigdata/stackovweflow-small/out_new/sparse/tfidf-vectors' >>>>> You possibly called FileSystem.get(conf) when you should have called >>>>> FileSystem.get(uri, conf) to obtain a file system supporting your >>>>> path. >>>>> >>>>> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:375) >>>>> at >>>>> org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:106) >>>>> at >>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:162) >>>>> at >>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:530) >>>>> at >>>>> org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:76) >>>>> at >>>>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:93) >>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >>>>> at >>>>> bbuzz2011.stackoverflow.runner.RunnerWithInParams.cluster(RunnerWithInParams.java:121) >>>>> at >>>>> bbuzz2011.stackoverflow.runner.RunnerWithInParams.run(RunnerWithInParams.java:52)cause >>>>> of this a >>>>> at >>>>> bbuzz2011.stackoverflow.runner.RunnerWithInParams.main(RunnerWithInParams.java:41) >>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>> at >>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>>>> at >>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:156) >>>>> >>>>> >>>>> I checked RandomSeedGenerator.buildRandom >>>>> ( >>>>> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.8/org/apache/mahout/clustering/kmeans/EigenSeedGenerator.java?av=f >>>>> ) >>>>> and I assume it has correct code: >>>>> >>>>> FileSystem fs = FileSystem.get(output.toUri(), conf); >>>>> >>>>> >>>>> I can not run clustering because of this error. May be you have any >>>>> ideas how to fix this? >
