It will work on an OutputStream that supports append.

http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
java.io.OutputStream)

So it depends on how well HDFS implements FileSystem#append(), not on
any changes in Avro.

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)

I have no recent personal experience with append in HDFS.  Does anyone
else here?

Doug

On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak <michaelma...@yahoo.com> wrote:
> My understanding is that will append to a file on the local filesystem, but 
> not to a file on HDFS.
>
> --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org> wrote:
>
>> From: Doug Cutting <cutt...@apache.org>
>> Subject: Re: Is it possible to append to an already existing avro file
>> To: user@avro.apache.org
>> Date: Tuesday, February 5, 2013, 5:08 PM
>> The Jira is:
>>
>> https://issues.apache.org/jira/browse/AVRO-1035
>>
>> It is possible to append to an existing Avro file:
>>
>> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>
>> Should we close that issue as "fixed"?
>>
>> Doug
>>
>> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak <michaelma...@yahoo.com>
>> wrote:
>> > Was a JIRA ticket ever created regarding appending to
>> an existing Avro file on HDFS?
>> >
>> > What is the status of such a capability, a year out
>> from when the issue below was raised?
>> >
>> > On Wed, 22 Feb 2012 10:57:48 +0100, "Vyacheslav
>> Zholudev" <vyacheslav.zholu...@gmail.com>
>> wrote:
>> >
>> >> Thanks for your reply, I suspected this.
>> >>
>> >> I will create a JIRA ticket.
>> >>
>> >> Vyacheslav
>> >>
>> >> On Feb 21, 2012, at 6:02 PM, Scott Carey wrote:
>> >>
>> >>>
>> >>> On 2/21/12 7:29 AM, "Vyacheslav Zholudev"
>> <vyacheslav.zholu...@gmail.com>
>> >>> wrote:
>> >>>
>> >>>> Yep, I saw that method as well as the
>> stackoverflow post. However, I'm
>> >>>> interested how to append to a file on the
>> arbitrary file system, not
>> >>>> only on the local one.
>> >>>>
>> >>>> I want to get an OutputStream based on the
>> Path and the FileSystem
>> >>>> implementation and then pass it for
>> appending to avro methods.
>> >>>>
>> >>>> Is that possible?
>> >>>
>> >>> It is not possible without modifying
>> DataFileWriter. Please open a JIRA
>> >>> ticket.
>> >>>
>> >>> It could not simply append to an OutputStream,
>> since it must either:
>> >>> * Seek to the start to validate the schemas
>> match and find the sync
>> >>> marker, or
>> >>> * Trust that the schemas match and find the
>> sync marker from the last
>> >>> block
>> >>>
>> >>> DataFileWriter cannot refer to Hadoop classes
>> such as FileSystem, but we
>> >>> could add something to the mapred module that
>> takes a Path and
>> >>> FileSystem and returns something that
>> implemements an interface that
>> >>> DataFileWriter can append to.  This would
>> be something that is both a
>> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >>> and an OutputStream, or has both an InputStream
>> from the start of the
>> >>> existing file and an OutputStream at the end.
>> >>>
>> >>>> Thanks,
>> >>>> Vyacheslav
>> >>>>
>> >>>> On Feb 21, 2012, at 5:29 AM, Harsh J
>> wrote:
>> >>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>> Use the appendTo feature of the
>> DataFileWriter. See
>> >>>>>
>> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >>>>>
>> >>>>> For a quick setup example, read also:
>> >>>>>
>> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >>>>>
>> >>>>> On Tue, Feb 21, 2012 at 3:15 AM,
>> Vyacheslav Zholudev
>> >>>>> <vyacheslav.zholu...@gmail.com>
>> wrote:
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> is it possible to append to an
>> already existing avro file when it was
>> >>>>>> written and closed before?
>> >>>>>>
>> >>>>>> If I use
>> >>>>>> outputStream =
>> fs.append(avroFilePath);
>> >>>>>>
>> >>>>>> then later on I get:
>> java.io.IOException: Invalid sync!
>> >>>>>>
>> >>>>>> Probably because the schema is
>> written twice and some other issues.
>> >>>>>>
>> >>>>>> If I use outputStream =
>> fs.create(avroFilePath); then the avro file
>> >>>>>> gets
>> >>>>>> overwritten.
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Vyacheslav
>> >>>>>
>> >>>>> --
>> >>>>> Harsh J
>> >>>>> Customer Ops. Engineer
>> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >
>>
>> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak <michaelma...@yahoo.com>
>> wrote:
>> > Was a JIRA ticket ever created regarding appending to
>> an existing Avro file on HDFS?
>> >
>> > What is the status of such a capability, a year out
>> from when the issue below was raised?
>> >
>> > On Wed, 22 Feb 2012 10:57:48 +0100, "Vyacheslav
>> Zholudev" <vyacheslav.zholu...@gmail.com>
>> wrote:
>> >
>> >> Thanks for your reply, I suspected this.
>> >>
>> >> I will create a JIRA ticket.
>> >>
>> >> Vyacheslav
>> >>
>> >> On Feb 21, 2012, at 6:02 PM, Scott Carey wrote:
>> >>
>> >>>
>> >>> On 2/21/12 7:29 AM, "Vyacheslav Zholudev"
>> <vyacheslav.zholu...@gmail.com>
>> >>> wrote:
>> >>>
>> >>>> Yep, I saw that method as well as the
>> stackoverflow post. However, I'm
>> >>>> interested how to append to a file on the
>> arbitrary file system, not
>> >>>> only on the local one.
>> >>>>
>> >>>> I want to get an OutputStream based on the
>> Path and the FileSystem
>> >>>> implementation and then pass it for
>> appending to avro methods.
>> >>>>
>> >>>> Is that possible?
>> >>>
>> >>> It is not possible without modifying
>> DataFileWriter. Please open a JIRA
>> >>> ticket.
>> >>>
>> >>> It could not simply append to an OutputStream,
>> since it must either:
>> >>> * Seek to the start to validate the schemas
>> match and find the sync
>> >>> marker, or
>> >>> * Trust that the schemas match and find the
>> sync marker from the last
>> >>> block
>> >>>
>> >>> DataFileWriter cannot refer to Hadoop classes
>> such as FileSystem, but we
>> >>> could add something to the mapred module that
>> takes a Path and
>> >>> FileSystem and returns something that
>> implemements an interface that
>> >>> DataFileWriter can append to.  This would
>> be something that is both a
>> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>> >>> and an OutputStream, or has both an InputStream
>> from the start of the
>> >>> existing file and an OutputStream at the end.
>> >>>
>> >>>> Thanks,
>> >>>> Vyacheslav
>> >>>>
>> >>>> On Feb 21, 2012, at 5:29 AM, Harsh J
>> wrote:
>> >>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>> Use the appendTo feature of the
>> DataFileWriter. See
>> >>>>>
>> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>> >>>>>
>> >>>>> For a quick setup example, read also:
>> >>>>>
>> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>> >>>>>
>> >>>>> On Tue, Feb 21, 2012 at 3:15 AM,
>> Vyacheslav Zholudev
>> >>>>> <vyacheslav.zholu...@gmail.com>
>> wrote:
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> is it possible to append to an
>> already existing avro file when it was
>> >>>>>> written and closed before?
>> >>>>>>
>> >>>>>> If I use
>> >>>>>> outputStream =
>> fs.append(avroFilePath);
>> >>>>>>
>> >>>>>> then later on I get:
>> java.io.IOException: Invalid sync!
>> >>>>>>
>> >>>>>> Probably because the schema is
>> written twice and some other issues.
>> >>>>>>
>> >>>>>> If I use outputStream =
>> fs.create(avroFilePath); then the avro file
>> >>>>>> gets
>> >>>>>> overwritten.
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Vyacheslav
>> >>>>>
>> >>>>> --
>> >>>>> Harsh J
>> >>>>> Customer Ops. Engineer
>> >>>>> Cloudera | http://tiny.cloudera.com/about
>> >
>>

Reply via email to