It's ALIVE!!! And I got the same count: 359.  Recap: Setting
mapred.child.java.opts in mapred-site.xml was the thing that worked. So,
shakka-kahn!

Found this exciting Hadoop quirk:
The following lines run in sequence produce an error:
$HADOOP_HOME/bin/hadoop dfs -mkdir /fpm-input
$HADOOP_HOME/bin/hadoop dfs -put /home/mahout-in/fpm-input/* /fpm-input
--> Error= ..."could only be replicated to 0 nodes instead of 1 hadoop"...

I found this page:
http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment  which
suggested a high level workaroud of loading a status page.

I found a simple solution that worked:
$HADOOP_HOME/bin/hadoop dfs -mkdir /fpm-input
$HADOOP_HOME/bin/hadoop dfs -ls /fpm-input  <-- this somehow makes hadoop
stop complaining.
$HADOOP_HOME/bin/hadoop dfs -put /home/mahout-in/fpm-input/* /fpm-input

On Fri, May 21, 2010 at 11:08 AM, Jeff Eastman
<[email protected]>wrote:

> I think the hadoop-config.sh heap size only affects heaps on the Hadoop
> daemons. I had it at 2g earlier when I was getting the OMEs on fpg. I added
> a note to the wiki page about setting mapred.child.java.opts to 2g and also
> to remove the other config values that were set for single-node operation
> (esp dfs.replication=1).
>
> I assume you added HADOOP_CONF_DIR so it actually runs in Hadoop?
>
>
>
> On 5/21/10 10:52 AM, Mike Roberts wrote:
>
>> Exciting.  Yeah, I also set my heapsize to 2G.  I set it in the
>> hadoop-config.sh file.  Did you do it there or did you instead set it in
>> /conf/madred-site.xml -->    mapred.child.java.opts?  That'd be my next
>> step
>> if I were actually getting memory errors, but wasn't even sure that real
>> data could be produced.
>>
>> Kinda scary that it'll exit successfully without results.  Does mahout
>> ever
>> return "wrong" results?  That is, there should be 120,000 results, but
>> because of some memory config somewhere it successfully returns just
>> 100,000
>> results?  Anyone ever see that, and if so, how do you deal with it?
>>   conf/mapred-site.xml mapred.child.java.opts  conf/mapred-site.xml
>> mapred.child.java.opts
>>
>> On Fri, May 21, 2010 at 10:36 AM, Jeff Eastman
>> <[email protected]>wrote:
>>
>>
>>
>>> On 5/20/10 9:51 PM, Mike Roberts wrote:
>>>
>>>
>>>
>>>> ./bin/mahout seqdumper --seqFile patterns/fpgrowth/part-r-00000
>>>>
>>>>
>>>>
>>> After reconfiguring a 4-node cluster to set the java heapsize to 2g I got
>>> 92144 in patterns/fpgrowth/part-r-00000 and got Count: 359 and volumes of
>>> output after seqdumper. But its only using a single mapper/reducer in all
>>> the steps (probably why it OMEs with the default heap). I also tried
>>> Drew's
>>> -Dmapred.reduce.tasks=2 trick but bin/mahout barfs on that.
>>>
>>>
>>>
>>
>>
>
>

Reply via email to