fileSystem.create(...) is blocked when selector is closed

2009-04-26 Thread javateck javateck
I got this selector exception, and all my threads are blocked at
FileSystem.create(...) level, anyone see this issue before, I'm running at
0.18.3.

java.nio.channels.ClosedSelectorException
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:66)
at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:88)
at sun.nio.ch.Util.releaseTemporarySelector(Util.java:135)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:301)
at
org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:178)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:820)
at org.apache.hadoop.ipc.Client.call(Client.java:705)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:2302)
at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:471)
at
org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:178)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:503)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:391)

*all other threads are blocked when calling fileSystem.create(...)*

"Thread-20" prio=5 tid=0x000101a2f000 nid=0x11ff2a000 in Object.wait()
[0x00011ff28000..0x00011ff29ad0]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:485)
at org.apache.hadoop.ipc.Client.call(Client.java:710)
- locked <0x000107cc9430> (a org.apache.hadoop.ipc.Client$Call)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:2302)
at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:471)
at
org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:178)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:503)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:391)


Re: anyone knows why setting mapred.tasktracker.map.tasks.maximum not working?

2009-04-22 Thread javateck javateck
not actually
When I just run a standalone server, meaning the server is a namenode,
datanode, jobtracker and tasktracker, and I configured the map max to 10, I
have 174 62~75 MB files, my block size is 65MB. I can see that 189 map tasks
are generated for this, and only 2 are running, others are waiting.

When I configured another datanode, and have the same settings for
tasktracker, and then the task is running at 12 map tasks for the same task
which produces 189 map tasks, it's using 2 map task slots from my namenode
and 10 slots from my datanode.

I just can't figure out why the namenode is just running at 2 map tasks
while 10 are available.

On Tue, Apr 21, 2009 at 7:47 PM, jason hadoop wrote:

> There must be only 2 input splits being produced for your job.
> Either you have 2 unsplitable files, or the input file(s) you have are not
> large enough compared to the block size to be split.
>
> Table 6-1 in chapter 06 gives a breakdown of all of the configuration
> parameters that affect split size in hadoop 0.19. Alphas are available :)
>
> This is detailed in my book in ch06
>
> On Tue, Apr 21, 2009 at 5:07 PM, javateck javateck  >wrote:
>
> > anyone knows why setting *mapred.tasktracker.map.tasks.maximum* not
> > working?
> > I set it to 10, but still see only 2 map tasks running when running one
> job
> >
>
>
>
> --
> Alpha Chapters of my book on Hadoop are available
> http://www.apress.com/book/view/9781430219422
>


anyone knows why setting mapred.tasktracker.map.tasks.maximum not working?

2009-04-21 Thread javateck javateck
anyone knows why setting *mapred.tasktracker.map.tasks.maximum* not working?
I set it to 10, but still see only 2 map tasks running when running one job


Re: mapred.tasktracker.map.tasks.maximum

2009-04-21 Thread javateck javateck
I want to have something to clarify, for the max task slots, are these
places to check:
1. hadoop-site.xml
2. the specific job's job.conf which can be retrieved though the job, for
example, logs/job_200904212336_0002_conf.xml

Any other place to limit the task map counts?

In my case, it's strange, because I set 10 for "
mapred.tasktracker.map.tasks.maximum", and I check the job's conf is also
10, but actual hadoop is just using 2 map jobs.


On Tue, Apr 21, 2009 at 1:20 PM, javateck javateck wrote:

> I set my "mapred.tasktracker.map.tasks.maximum" to 10, but when I run a
> task, it's only using 2 out of 10, any way to know why it's only using 2?
> thanks
>


Re: mapred.tasktracker.map.tasks.maximum

2009-04-21 Thread javateck javateck
no, it's plain text file with \t delimited. And I'm expecting one mapper per
file, because I have 175 files, and I got 189 map tasks from what I can see
from the web UI. My issue is that since I have 189 map tasks waiting, why
hadoop is just using 2 of my 10 map slots, and I assume that all map tasks
should be independent.


On Tue, Apr 21, 2009 at 2:23 PM, Miles Osborne  wrote:

> is your input data compressed?  if so then you will get one mapper per file
>
> Miles
>
> 2009/4/21 javateck javateck :
> > Hi Koji,
> >
> > Thanks for helping.
> >
> > I don't know why hadoop is just using 2 out of 10 map tasks slots.
> >
> > Sure, I just cut and paste the job tracker web UI, clearly I set the max
> > tasks to 10(which I can verify from hadoop-site.xml and from the
> individual
> > job configuration also), and I did have the first mapreduce running at 10
> > map tasks when I checked from UI, but all subsequent queries are running
> > with 2 map tasks. And I have almost 176 files with each input file around
> > 62~75MB.
> >
> >
> > *mapred.tasktracker.map.tasks.maximum* 10
> >
> >  *Kind*
> >
> > *% Complete*
> >
> > *Num Tasks*
> >
> > *Pending*
> >
> > *Running*
> >
> > *Complete*
> >
> > *Killed*
> >
> > *Failed/Killed*<
> http://etsx18.apple.com:50030/jobfailures.jsp?jobid=job_200904211923_0025>
> >
> > *Task Attempts*
> >
> > *map*
> >
> > 28.04%
> >
> >
> >
> >   189
> >
> > 134
> >
> > 2
> >
> > 53
> >
> > 0
> >
> > 0 / 0
> >
> > *reduce*
> >
> > 0.00%
> >
> >
> >   1
> >
> > 1<
> http://etsx18.apple.com:50030/jobtasks.jsp?jobid=job_200904211923_0025&type=reduce&pagenum=1&state=pending
> >
> >
> > 0
> >
> > 0
> >
> > 0
> >
> > 0 / 0
> >
> > *
> > *
> >
> > On Tue, Apr 21, 2009 at 1:56 PM, Koji Noguchi  >wrote:
> >
> >> It's probably a silly question, but you do have more than 2 mappers on
> >> your second job?
> >>
> >> If yes, I have no idea what's happening.
> >>
> >> Koji
> >>
> >> -Original Message-
> >> From: javateck javateck [mailto:javat...@gmail.com]
> >> Sent: Tuesday, April 21, 2009 1:38 PM
> >> To: core-user@hadoop.apache.org
> >> Subject: Re: mapred.tasktracker.map.tasks.maximum
> >>
> >> right, I set it in hadoop-site.xml before starting the whole hadoop
> >> processes, I have one job running fully utilizing the 10 map tasks, but
> >> subsequent queries are only using 2 of them, don't know why.
> >> I have enough RAM also, no paging out is happening, I'm running on
> >> 0.18.3.
> >> Right now I put all processes on one machine, namenode, datanode,
> >> jobtracker, tasktracker, I have a 2*4core CPU, and 20GB RAM.
> >>
> >>
> >> On Tue, Apr 21, 2009 at 1:25 PM, Koji Noguchi
> >> wrote:
> >>
> >> > This is a cluster config and not a per job config.
> >> >
> >> > So this has to be set when the mapreduce cluster first comes up.
> >> >
> >> > Koji
> >> >
> >> >
> >> > -Original Message-
> >> > From: javateck javateck [mailto:javat...@gmail.com]
> >> > Sent: Tuesday, April 21, 2009 1:20 PM
> >> > To: core-user@hadoop.apache.org
> >> > Subject: mapred.tasktracker.map.tasks.maximum
> >> >
> >> > I set my "mapred.tasktracker.map.tasks.maximum" to 10, but when I run
> >> a
> >> > task, it's only using 2 out of 10, any way to know why it's only using
> >> > 2?
> >> > thanks
> >> >
> >>
> >
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>


Re: mapred.tasktracker.map.tasks.maximum

2009-04-21 Thread javateck javateck
Hi Koji,

Thanks for helping.

I don't know why hadoop is just using 2 out of 10 map tasks slots.

Sure, I just cut and paste the job tracker web UI, clearly I set the max
tasks to 10(which I can verify from hadoop-site.xml and from the individual
job configuration also), and I did have the first mapreduce running at 10
map tasks when I checked from UI, but all subsequent queries are running
with 2 map tasks. And I have almost 176 files with each input file around
62~75MB.


*mapred.tasktracker.map.tasks.maximum* 10

  *Kind*

*% Complete*

*Num Tasks*

*Pending*

*Running*

*Complete*

*Killed*

*Failed/Killed*<http://etsx18.apple.com:50030/jobfailures.jsp?jobid=job_200904211923_0025>

*Task Attempts*

*map*

28.04%



   189

134

2

53

0

0 / 0

*reduce*

0.00%


   1

1<http://etsx18.apple.com:50030/jobtasks.jsp?jobid=job_200904211923_0025&type=reduce&pagenum=1&state=pending>

0

0

0

0 / 0

*
*

On Tue, Apr 21, 2009 at 1:56 PM, Koji Noguchi wrote:

> It's probably a silly question, but you do have more than 2 mappers on
> your second job?
>
> If yes, I have no idea what's happening.
>
> Koji
>
> -Original Message-
> From: javateck javateck [mailto:javat...@gmail.com]
> Sent: Tuesday, April 21, 2009 1:38 PM
> To: core-user@hadoop.apache.org
> Subject: Re: mapred.tasktracker.map.tasks.maximum
>
> right, I set it in hadoop-site.xml before starting the whole hadoop
> processes, I have one job running fully utilizing the 10 map tasks, but
> subsequent queries are only using 2 of them, don't know why.
> I have enough RAM also, no paging out is happening, I'm running on
> 0.18.3.
> Right now I put all processes on one machine, namenode, datanode,
> jobtracker, tasktracker, I have a 2*4core CPU, and 20GB RAM.
>
>
> On Tue, Apr 21, 2009 at 1:25 PM, Koji Noguchi
> wrote:
>
> > This is a cluster config and not a per job config.
> >
> > So this has to be set when the mapreduce cluster first comes up.
> >
> > Koji
> >
> >
> > -Original Message-
> > From: javateck javateck [mailto:javat...@gmail.com]
> > Sent: Tuesday, April 21, 2009 1:20 PM
> > To: core-user@hadoop.apache.org
> > Subject: mapred.tasktracker.map.tasks.maximum
> >
> > I set my "mapred.tasktracker.map.tasks.maximum" to 10, but when I run
> a
> > task, it's only using 2 out of 10, any way to know why it's only using
> > 2?
> > thanks
> >
>


Re: mapred.tasktracker.map.tasks.maximum

2009-04-21 Thread javateck javateck
right, I set it in hadoop-site.xml before starting the whole hadoop
processes, I have one job running fully utilizing the 10 map tasks, but
subsequent queries are only using 2 of them, don't know why.
I have enough RAM also, no paging out is happening, I'm running on 0.18.3.
Right now I put all processes on one machine, namenode, datanode,
jobtracker, tasktracker, I have a 2*4core CPU, and 20GB RAM.


On Tue, Apr 21, 2009 at 1:25 PM, Koji Noguchi wrote:

> This is a cluster config and not a per job config.
>
> So this has to be set when the mapreduce cluster first comes up.
>
> Koji
>
>
> -Original Message-
> From: javateck javateck [mailto:javat...@gmail.com]
> Sent: Tuesday, April 21, 2009 1:20 PM
> To: core-user@hadoop.apache.org
> Subject: mapred.tasktracker.map.tasks.maximum
>
> I set my "mapred.tasktracker.map.tasks.maximum" to 10, but when I run a
> task, it's only using 2 out of 10, any way to know why it's only using
> 2?
> thanks
>


mapred.tasktracker.map.tasks.maximum

2009-04-21 Thread javateck javateck
I set my "mapred.tasktracker.map.tasks.maximum" to 10, but when I run a
task, it's only using 2 out of 10, any way to know why it's only using 2?
thanks


raw files become zero bytes when mapreduce job hit outofmemory error

2009-04-13 Thread javateck javateck
I'm running some mapreduce, and some jobs has outofmemory errors, and I find
that that the raw data itself also got corrupted, becomes zero bytes, very
strange to me, I did not look very detail into it, but just want to check
quickly with someone with such experience. I'm running at 0.18.3.
thanks


Re: API: FSDataOutputStream create(Path f, boolean overwrite)

2009-04-12 Thread javateck javateck
sorry, it's my fault, it's working as expected

On Sun, Apr 12, 2009 at 12:43 AM, javateck javateck wrote:

> Hi:
>   I'm trying to use "FSDataOutputStream create(Path f, boolean
> overwrite)", I'm calling "create(new Path("somePath"), false)", but creation
> still fails with IOException even when the file does not exist, can someone
> explain the behavior?
>
> thanks,
>


API: FSDataOutputStream create(Path f, boolean overwrite)

2009-04-12 Thread javateck javateck
Hi:
  I'm trying to use "FSDataOutputStream create(Path f, boolean overwrite)",
I'm calling "create(new Path("somePath"), false)", but creation still fails
with IOException even when the file does not exist, can someone explain the
behavior?

thanks,


does hadoop have any way to append to an existing file?

2009-04-10 Thread javateck javateck
Hi,
  does hadoop have any way to append to an existing file? for example, I
wrote some contents to a file, and later on I want to append some more
contents to the file.

thanks,


safemode forever

2009-04-07 Thread javateck javateck
Hi,
  I'm wondering if anyone has solutions about the nonstopped safe mode, any
way to get it around?

  thanks,

error: org.apache.hadoop.dfs.SafeModeException: Cannot delete
/mapred/system. Name node is in safe mode.
The ratio of reported blocks 0.4696 has not reached the threshold 0.9990.
Safe mode will be turned off automatically.


hadoop 0.18.3 writing not flushing to hadoop server?

2009-04-06 Thread javateck javateck
I have a strange issue that when I write to hadoop, I find that the content
is not transferred to hadoop even after a long time, is there any way to
force flush the local temp files to hadoop after writing to hadoop? And when
I shutdown the VM, it's getting flushed.
thanks,


HDFS data block clarification

2009-04-02 Thread javateck javateck
  Can someone tell whether a file will occupy one or more blocks? for
example, the default block size is 64MB, and if I save a 4k file to HDFS,
will the 4K file occupy the whole 64MB block alone? so in this case, do I do
need to configure the block size to 10k if most of my files are less than
10K?

thanks,


Re: Running MapReduce without setJar

2009-04-01 Thread javateck javateck
you can run from java program:

JobConf conf = new JobConf(MapReduceWork.class);

// setting your params

JobClient.runJob(conf);


On Wed, Apr 1, 2009 at 11:42 AM, Farhan Husain  wrote:

> Can I get rid of the whole jar thing? Is there any way to run map reduce
> programs without using a jar? I do not want to use "hadoop jar ..." either.
>
> On Wed, Apr 1, 2009 at 1:10 PM, javateck javateck  >wrote:
>
> > I think you need to set a property (mapred.jar) inside hadoop-site.xml,
> > then
> > you don't need to hardcode in your java code, and it will be fine.
> > But I don't know if there is any way that we can set multiple jars, since
> a
> > lot of times our own mapreduce class needs to reference other jars.
> >
> > On Wed, Apr 1, 2009 at 10:57 AM, Farhan Husain 
> wrote:
> >
> > > Hello,
> > >
> > > Can anyone tell me if there is any way running a map-reduce job from a
> > java
> > > program without specifying the jar file by JobConf.setJar() method?
> > >
> > > Thanks,
> > >
> > > --
> > > Mohammad Farhan Husain
> > > Research Assistant
> > > Department of Computer Science
> > > Erik Jonsson School of Engineering and Computer Science
> > > University of Texas at Dallas
> > >
> >
>
>
>
> --
> Mohammad Farhan Husain
> Research Assistant
> Department of Computer Science
> Erik Jonsson School of Engineering and Computer Science
> University of Texas at Dallas
>


Re: Running MapReduce without setJar

2009-04-01 Thread javateck javateck
I think you need to set a property (mapred.jar) inside hadoop-site.xml, then
you don't need to hardcode in your java code, and it will be fine.
But I don't know if there is any way that we can set multiple jars, since a
lot of times our own mapreduce class needs to reference other jars.

On Wed, Apr 1, 2009 at 10:57 AM, Farhan Husain  wrote:

> Hello,
>
> Can anyone tell me if there is any way running a map-reduce job from a java
> program without specifying the jar file by JobConf.setJar() method?
>
> Thanks,
>
> --
> Mohammad Farhan Husain
> Research Assistant
> Department of Computer Science
> Erik Jonsson School of Engineering and Computer Science
> University of Texas at Dallas
>