Re: Using SparseVectorsFromSequenceFiles () in Java

Ken Krugler Wed, 18 Sep 2013 07:03:55 -0700

Hi Darius,

On Sep 18, 2013, at 1:10am, Gokhan Capan wrote:


> It seems you hit a "Hadoop on Windows" issue, it might have something to do
> with how Hadoop sets file permissions.

From my experience, only the (old) 0.20.2 version of Hadoop works well with 
Cygwin, otherwise you run into file permissions issues like the one you 
mentioned.

If you want to give that version a try, and can't find a download, see 
http://scaleunlimited.com/downloads/3nn2pq/hadoop-0.20.2.tgz

-- Ken


> On Tue, Sep 17, 2013 at 3:02 PM, Darius Miliauskas <
> [email protected]> wrote:
> 
>> That's like a charm, Gokhan, your suggestion was on point again. However...
>> Despite the fact that the build is successful, the file is still empty,
>> and I got the exception as always on Windows:
>> 
>> java.io.IOException: Failed to set permissions of path:
>> \tmp\hadoop-DARIUS\mapred\staging\DARIUS331150778\.staging to 0777
>> at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)
>> at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:670)
>> at
>> 
>> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514)
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349)
>> at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
>> at
>> 
>> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
>> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:918)
>> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
>> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
>> at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
>> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
>> at
>> 
>> org.apache.mahout.vectorizer.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:93)
>> at
>> 
>> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:257)
>> at
>> 
>> org.apache.mahout.mahoutnewsrecommender2.Recommender.myRecommender(Recommender.java:99)
>> at org.apache.mahout.mahoutnewsrecommender2.App.main(App.java:26)
>> 
>> BUILD SUCCESSFUL (total time: 3 seconds)
>> 
>> 
>> Thanks,
>> 
>> Darius
>> 
>> 
>> 
>> 
>> 2013/9/12 Gokhan Capan <[email protected]>
>> 
>>> Although Windows is not officially supported, your
>>> svsf.run(new String[]{inputPath.toString(), outputPath.toString()})
>>> should be
>>> svsf.run(new String[]{"-i",inputPath.toString(), "-o",
>>> outputPath.toString()}) anyway.
>>> 
>>> Best
>>> 
>>> 
>>> Gokhan
>>> 
>>> 
>>> On Thu, Sep 12, 2013 at 4:14 PM, Darius Miliauskas <
>>> [email protected]> wrote:
>>> 
>>>> Dear All,
>>>> 
>>>> I am trying to use SparseVectorsFromSequenceFiles () through Java code
>>>> (NetBeans 7&Windows 7) . here is my code (API):
>>>> 
>>>> //inputPath is the path of my SequenceFile
>>>> Path inputPath = new Path(""C:\\Users\\DARIUS\\forTest1.txt");
>>>> 
>>>> //outputPath where I expect some results
>>>> Path outputPath = new Path("C:\\Users\\DARIUS\\forTest2.txt");
>>>> 
>>>> SparseVectorsFromSequenceFiles svfsf = new
>> SparseVectorsFromSequenceFiles
>>>> ();
>>>> svfsf.run(new String []{inputPath.toString(), outputPath.toString()
>>>> });
>>>> 
>>>> Build is successful. However, at the end I got just the empty file what
>>> was
>>>> expected to be my output. Do you have any idea why the output file is
>>>> empty, and what I should change in the code to get the results?
>>>> 
>>>> 
>>>> Ciao,
>>>> 
>>>> Darius
>>>> 
>>> 
>> 

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr

Re: Using SparseVectorsFromSequenceFiles () in Java

Reply via email to