yes, i have a lot of small files. this is because i wanted to process
hourly instead of daily.

i will be checking into whether this is the case, i now am re-running
the process, and I see

332 files and directories, 231 blocks = 563 total. Heap Size is 119.88
MB / 910.25 MB (13%)
Configured Capacity     :       140.72 GB
DFS Used        :       6.63 MB
Non DFS Used    :       8.76 GB
DFS Remaining   :       131.95 GB
DFS Used%       :       0 %
DFS Remaining%  :       93.77 %

I do not think this is the case, but I will be monitoring, and will
see in half an hour.

best regards, and thanks a bunch.

-cam



On Sat, Feb 12, 2011 at 3:00 AM, Christopher, Pat
<patrick.christop...@hp.com> wrote:
> If you're running with the defaults I think its around 20gb.  If you're 
> processing a couple hundred MBs you could easily hit this limit between 
> desired outputs and any intermediate files created.  HDFS allocates the 
> available space in blocks so if you have a lot of small files, you'll run out 
> of blocks before you run out space.  This is one reason why HDFS/hadoop is 
> 'bad' for dealing with lots of small files.
>
> You can check here:  localhost:50070 that's the web page for your hdfs 
> namenode.  It has status information on your hdfs including size.
>
> Pat
>
> -----Original Message-----
> From: Cam Bazz [mailto:camb...@gmail.com]
> Sent: Friday, February 11, 2011 4:55 PM
> To: user@hive.apache.org
> Subject: Re: error out of all sudden
>
> but is there a ridiculously low default for hdfs space limits? I
> looked everywhere in the configuration files, but could not find
> anything that limits the size of hdfs
>
> i think i am running on a 150GB hard drive, and the data I am
> processing is in amounts of couple of hundred of megabytes at max.
>
> best regards,
>
> -cam
>
>
>
> On Sat, Feb 12, 2011 at 2:44 AM, Christopher, Pat
> <patrick.christop...@hp.com> wrote:
>> Is your hdfs hitting its space limits?
>>
>> Pat
>>
>> -----Original Message-----
>> From: Cam Bazz [mailto:camb...@gmail.com]
>> Sent: Friday, February 11, 2011 4:38 PM
>> To: user@hive.apache.org
>> Subject: error out of all sudden
>>
>> Hello,
>>
>> I set up my one node pseudo distributed system, left with a cronjob,
>> copying data from a remote server and loading them to hadoop, and
>> doing some calculations per hour.
>>
>> It stopped working today, giving me this error. I deleted everything,
>> and made it reprocess from beginning, and i still get the same error
>> same place.
>>
>> is there a limit, on how many partitions there can be in a table?
>>
>> so, I tried for couple of hours solving the problem, but now my hive
>> fun is over...
>>
>> any ideas as to why this might be happening, or what should i do tring
>> to debug it.
>>
>> best regards,
>> -c.b.
>>
>>
>> 11/02/12 01:27:47 INFO ql.Driver: Starting command: load data local
>> inpath '/var/mylog/hourly/log.CAT.2011021119' into table cat_raw
>> partition(date_hour=2011021119)
>> Copying data from file:/var/mylog/hourly/log.CAT.2011021119
>>
>> 11/02/12 01:27:47 INFO exec.CopyTask: Copying data from
>> file:/var/mylog/hourly/log.CAT.2011021119 to
>> hdfs://darkstar:9000/tmp/hive-cam/hive_2011-02-12_01-27-47_415_7165217842693560517/10000
>>
>> 11/02/12 01:27:47 INFO hdfs.DFSClient: Exception in
>> createBlockOutputStream java.io.EOFException
>> 11/02/12 01:27:47 INFO hdfs.DFSClient: Abandoning block
>> blk_6275225343572661963_1859
>> 11/02/12 01:27:53 INFO hdfs.DFSClient: Exception in
>> createBlockOutputStream java.io.EOFException
>> 11/02/12 01:27:53 INFO hdfs.DFSClient: Abandoning block
>> blk_2673116090916206836_1859
>> 11/02/12 01:27:59 INFO hdfs.DFSClient: Exception in
>> createBlockOutputStream java.io.EOFException
>> 11/02/12 01:27:59 INFO hdfs.DFSClient: Abandoning block
>> blk_5414825878079983460_1859
>> 11/02/12 01:28:05 INFO hdfs.DFSClient: Exception in
>> createBlockOutputStream java.io.EOFException
>> 11/02/12 01:28:05 INFO hdfs.DFSClient: Abandoning block
>> blk_6043862611357349730_1859
>> 11/02/12 01:28:11 WARN hdfs.DFSClient: DataStreamer Exception:
>> java.io.IOException: Unable to create new block.
>>        at 
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2845)
>>        at 
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
>>        at 
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
>>
>> 11/02/12 01:28:11 WARN hdfs.DFSClient: Error Recovery for block
>> blk_6043862611357349730_1859 bad datanode[0] nodes == null
>> 11/02/12 01:28:11 WARN hdfs.DFSClient: Could not get block locations.
>> Source file 
>> "/tmp/hive-cam/hive_2011-02-12_01-27-47_415_7165217842693560517/10000/log.CAT.2011021119"
>> - Aborting...
>> Failed with exception null
>> 11/02/12 01:28:11 ERROR exec.CopyTask: Failed with exception null
>> java.io.EOFException
>>        at java.io.DataInputStream.readByte(DataInputStream.java:250)
>>        at 
>> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>>        at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>>        at org.apache.hadoop.io.Text.readString(Text.java:400)
>>        at 
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2901)
>>        at 
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826)
>>        at 
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
>>        at 
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
>>
>

Reply via email to