Re: SequenceFiles, checkpoints, block size (Was: How to flush SequenceFile.Writer?)

2009-02-03 Thread Tom White
Hi Brian,

Writes to HDFS are not guaranteed to be flushed until the file is
closed. In practice, as each (64MB) block is finished it is flushed
and will be visible to other readers, which is what you were seeing.

The addition of appends in HDFS changes this and adds a sync() method
to FSDataOutputStream. You can read about the semantics of the new
operations here:
https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc.
Unfortunately, there are some problems with sync() that are still
being worked through
(https://issues.apache.org/jira/browse/HADOOP-4379). Also, even with
sync() working, the append() on SequenceFile does not do an implicit
sync() - it is not atomic. Furthermore, there is no way to get hold of
the FSDataOutputStream to call sync() yourself - see
https://issues.apache.org/jira/browse/HBASE-1155. (And don't get
confused by the sync() method on SequenceFile.Writer - it is for
another purpose entirely.)

As Jason points out, the simplest way to achieve what you're trying to
so is to close the file and start a new one. If you start to get too
many small files, then you can have another process to merge the
smaller files in the background.

Tom

On Tue, Feb 3, 2009 at 3:57 AM, jason hadoop  wrote:
> If you have to do a time based solution, for now, simply close the file and
> stage it, then open a new file.
> Your reads will have to deal with the fact the file is in multiple parts.
> Warning: Datanodes get pokey if they have large numbers of blocks, and the
> quickest way to do this is to create a lot of small files.
>
> On Mon, Feb 2, 2009 at 9:54 AM, Brian Long  wrote:
>
>> Let me rephrase this problem... as stated below, when I start writing to a
>> SequenceFile from an HDFS client, nothing is visible in HDFS until I've
>> written 64M of data. This presents three problems: fsck reports the file
>> system as corrupt until the first block is finally written out, the
>> presence
>> of the file (without any data) seems to blow up my mapred jobs that try to
>> make use of it under my input path, and finally, I want to basically flush
>> every 15 minutes or so so I can mapred the latest data.
>> I don't see any programmatic way to force the file to flush in 17.2.
>> Additionally, "dfs.checkpoint.period" does not seem to be obeyed. Does that
>> not do what I think it does? What controls the 64M limit, anyway? Is it
>> "dfs.checkpoint.size" or "dfs.block.size"? Is the buffering happening on
>> the
>> client, or on data nodes? Or in the namenode?
>>
>> It seems really bad that a SequenceFile, upon creation, is in an unusable
>> state from the perspective of a mapred job, and also leaves fsck in a
>> corrupt state. Surely I must be doing something wrong... but what? How can
>> I
>> ensure that a SequenceFile is immediately usable (but empty) on creation,
>> and how can I make things flush on some regular time interval?
>>
>> Thanks,
>> Brian
>>
>>
>> On Thu, Jan 29, 2009 at 4:17 PM, Brian Long  wrote:
>>
>> > I have a SequenceFile.Writer that I obtained via
>> SequenceFile.createWriter
>> > and write to using append(key, value). Because the writer volume is low,
>> > it's not uncommon for it to take over a day for my appends to finally be
>> > flushed to HDFS (e.g. the new file will sit at 0 bytes for over a day).
>> > Because I am running map/reduce tasks on this data multiple times a day,
>> I
>> > want to "flush" the sequence file so the mapred jobs can pick it up when
>> > they run.
>> > What's the right way to do this? I'm assuming it's a fairly common use
>> > case. Also -- are writes to the sequence files atomic? (e.g. if I am
>> > actively appending to a sequence file, is it always safe to read from
>> that
>> > same file in a mapred job?)
>> >
>> > To be clear, I want the flushing to be time based (controlled explicitly
>> by
>> > the app), not size based. Will this create waste in HDFS somehow?
>> >
>> > Thanks,
>> > Brian
>> >
>> >
>>
>


Re: SequenceFiles, checkpoints, block size (Was: How to flush SequenceFile.Writer?)

2009-02-02 Thread jason hadoop
If you have to do a time based solution, for now, simply close the file and
stage it, then open a new file.
Your reads will have to deal with the fact the file is in multiple parts.
Warning: Datanodes get pokey if they have large numbers of blocks, and the
quickest way to do this is to create a lot of small files.

On Mon, Feb 2, 2009 at 9:54 AM, Brian Long  wrote:

> Let me rephrase this problem... as stated below, when I start writing to a
> SequenceFile from an HDFS client, nothing is visible in HDFS until I've
> written 64M of data. This presents three problems: fsck reports the file
> system as corrupt until the first block is finally written out, the
> presence
> of the file (without any data) seems to blow up my mapred jobs that try to
> make use of it under my input path, and finally, I want to basically flush
> every 15 minutes or so so I can mapred the latest data.
> I don't see any programmatic way to force the file to flush in 17.2.
> Additionally, "dfs.checkpoint.period" does not seem to be obeyed. Does that
> not do what I think it does? What controls the 64M limit, anyway? Is it
> "dfs.checkpoint.size" or "dfs.block.size"? Is the buffering happening on
> the
> client, or on data nodes? Or in the namenode?
>
> It seems really bad that a SequenceFile, upon creation, is in an unusable
> state from the perspective of a mapred job, and also leaves fsck in a
> corrupt state. Surely I must be doing something wrong... but what? How can
> I
> ensure that a SequenceFile is immediately usable (but empty) on creation,
> and how can I make things flush on some regular time interval?
>
> Thanks,
> Brian
>
>
> On Thu, Jan 29, 2009 at 4:17 PM, Brian Long  wrote:
>
> > I have a SequenceFile.Writer that I obtained via
> SequenceFile.createWriter
> > and write to using append(key, value). Because the writer volume is low,
> > it's not uncommon for it to take over a day for my appends to finally be
> > flushed to HDFS (e.g. the new file will sit at 0 bytes for over a day).
> > Because I am running map/reduce tasks on this data multiple times a day,
> I
> > want to "flush" the sequence file so the mapred jobs can pick it up when
> > they run.
> > What's the right way to do this? I'm assuming it's a fairly common use
> > case. Also -- are writes to the sequence files atomic? (e.g. if I am
> > actively appending to a sequence file, is it always safe to read from
> that
> > same file in a mapred job?)
> >
> > To be clear, I want the flushing to be time based (controlled explicitly
> by
> > the app), not size based. Will this create waste in HDFS somehow?
> >
> > Thanks,
> > Brian
> >
> >
>


SequenceFiles, checkpoints, block size (Was: How to flush SequenceFile.Writer?)

2009-02-02 Thread Brian Long
Let me rephrase this problem... as stated below, when I start writing to a
SequenceFile from an HDFS client, nothing is visible in HDFS until I've
written 64M of data. This presents three problems: fsck reports the file
system as corrupt until the first block is finally written out, the presence
of the file (without any data) seems to blow up my mapred jobs that try to
make use of it under my input path, and finally, I want to basically flush
every 15 minutes or so so I can mapred the latest data.
I don't see any programmatic way to force the file to flush in 17.2.
Additionally, "dfs.checkpoint.period" does not seem to be obeyed. Does that
not do what I think it does? What controls the 64M limit, anyway? Is it
"dfs.checkpoint.size" or "dfs.block.size"? Is the buffering happening on the
client, or on data nodes? Or in the namenode?

It seems really bad that a SequenceFile, upon creation, is in an unusable
state from the perspective of a mapred job, and also leaves fsck in a
corrupt state. Surely I must be doing something wrong... but what? How can I
ensure that a SequenceFile is immediately usable (but empty) on creation,
and how can I make things flush on some regular time interval?

Thanks,
Brian


On Thu, Jan 29, 2009 at 4:17 PM, Brian Long  wrote:

> I have a SequenceFile.Writer that I obtained via SequenceFile.createWriter
> and write to using append(key, value). Because the writer volume is low,
> it's not uncommon for it to take over a day for my appends to finally be
> flushed to HDFS (e.g. the new file will sit at 0 bytes for over a day).
> Because I am running map/reduce tasks on this data multiple times a day, I
> want to "flush" the sequence file so the mapred jobs can pick it up when
> they run.
> What's the right way to do this? I'm assuming it's a fairly common use
> case. Also -- are writes to the sequence files atomic? (e.g. if I am
> actively appending to a sequence file, is it always safe to read from that
> same file in a mapred job?)
>
> To be clear, I want the flushing to be time based (controlled explicitly by
> the app), not size based. Will this create waste in HDFS somehow?
>
> Thanks,
> Brian
>
>