The avro-mapred module includes a Seekable implementation that works
with HDFS called FsInput:

http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/FsInput.html

With this, your example can be made considerably smaller.

Doug



On Thu, Feb 7, 2013 at 8:28 AM, Harsh J <ha...@cloudera.com> wrote:
> I assume by non-trivial you meant the extra Seekable stuff I needed to
> wrap around the DFS output streams to let Avro take it as append-able?
> I don't think its possible for Avro to carry it since Avro (core) does
> not reverse-depend on Hadoop. Should we document it somewhere though?
> Do you have any ideas on the best place to do that?
>
> On Thu, Feb 7, 2013 at 6:12 AM, Michael Malak <michaelma...@yahoo.com> wrote:
>> Thanks so much for the code -- it works great!
>>
>> Since it is a non-trivial amount of code required to achieve append, I 
>> suggest attaching that code to AVRO-1035, in the hopes that someone will 
>> come up with an interface that requires just one line of user code to 
>> achieve append.
>>
>> --- On Wed, 2/6/13, Harsh J <ha...@cloudera.com> wrote:
>>
>>> From: Harsh J <ha...@cloudera.com>
>>> Subject: Re: Is it possible to append to an already existing avro file
>>> To: user@avro.apache.org
>>> Date: Wednesday, February 6, 2013, 11:17 AM
>>> Hey Michael,
>>>
>>> It does implement the regular Java OutputStream interface,
>>> as seen in
>>> the API: 
>>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataOutputStream.html.
>>>
>>> Here's a sample program that works on Hadoop 2.x in my
>>> tests:
>>> https://gist.github.com/QwertyManiac/4724582
>>>
>>> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <michaelma...@yahoo.com>
>>> wrote:
>>> > I don't believe a Hadoop FileSystem is a Java
>>> OutputStream?
>>> >
>>> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org>
>>> wrote:
>>> >
>>> >> From: Doug Cutting <cutt...@apache.org>
>>> >> Subject: Re: Is it possible to append to an already
>>> existing avro file
>>> >> To: user@avro.apache.org
>>> >> Date: Tuesday, February 5, 2013, 5:27 PM
>>> >> It will work on an OutputStream that
>>> >> supports append.
>>> >>
>>> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
>>> >> java.io.OutputStream)
>>> >>
>>> >> So it depends on how well HDFS implements
>>> >> FileSystem#append(), not on
>>> >> any changes in Avro.
>>> >>
>>> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
>>> >>
>>> >> I have no recent personal experience with append
>>> in
>>> >> HDFS.  Does anyone
>>> >> else here?
>>> >>
>>> >> Doug
>>> >>
>>> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak
>>> <michaelma...@yahoo.com>
>>> >> wrote:
>>> >> > My understanding is that will append to a file
>>> on the
>>> >> local filesystem, but not to a file on HDFS.
>>> >> >
>>> >> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org>
>>> >> wrote:
>>> >> >
>>> >> >> From: Doug Cutting <cutt...@apache.org>
>>> >> >> Subject: Re: Is it possible to append to
>>> an already
>>> >> existing avro file
>>> >> >> To: user@avro.apache.org
>>> >> >> Date: Tuesday, February 5, 2013, 5:08 PM
>>> >> >> The Jira is:
>>> >> >>
>>> >> >> https://issues.apache.org/jira/browse/AVRO-1035
>>> >> >>
>>> >> >> It is possible to append to an existing
>>> Avro file:
>>> >> >>
>>> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>> >> >>
>>> >> >> Should we close that issue as "fixed"?
>>> >> >>
>>> >> >> Doug
>>> >> >>
>>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>>> Malak
>>> >> <michaelma...@yahoo.com>
>>> >> >> wrote:
>>> >> >> > Was a JIRA ticket ever created
>>> regarding
>>> >> appending to
>>> >> >> an existing Avro file on HDFS?
>>> >> >> >
>>> >> >> > What is the status of such a
>>> capability, a
>>> >> year out
>>> >> >> from when the issue below was raised?
>>> >> >> >
>>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>>> >> "Vyacheslav
>>> >> >> Zholudev" <vyacheslav.zholu...@gmail.com>
>>> >> >> wrote:
>>> >> >> >
>>> >> >> >> Thanks for your reply, I
>>> suspected this.
>>> >> >> >>
>>> >> >> >> I will create a JIRA ticket.
>>> >> >> >>
>>> >> >> >> Vyacheslav
>>> >> >> >>
>>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>>> Scott Carey
>>> >> wrote:
>>> >> >> >>
>>> >> >> >>>
>>> >> >> >>> On 2/21/12 7:29 AM,
>>> "Vyacheslav
>>> >> Zholudev"
>>> >> >> <vyacheslav.zholu...@gmail.com>
>>> >> >> >>> wrote:
>>> >> >> >>>
>>> >> >> >>>> Yep, I saw that method as
>>> well as
>>> >> the
>>> >> >> stackoverflow post. However, I'm
>>> >> >> >>>> interested how to append
>>> to a file
>>> >> on the
>>> >> >> arbitrary file system, not
>>> >> >> >>>> only on the local one.
>>> >> >> >>>>
>>> >> >> >>>> I want to get an
>>> OutputStream
>>> >> based on the
>>> >> >> Path and the FileSystem
>>> >> >> >>>> implementation and then
>>> pass it
>>> >> for
>>> >> >> appending to avro methods.
>>> >> >> >>>>
>>> >> >> >>>> Is that possible?
>>> >> >> >>>
>>> >> >> >>> It is not possible without
>>> modifying
>>> >> >> DataFileWriter. Please open a JIRA
>>> >> >> >>> ticket.
>>> >> >> >>>
>>> >> >> >>> It could not simply append to
>>> an
>>> >> OutputStream,
>>> >> >> since it must either:
>>> >> >> >>> * Seek to the start to
>>> validate the
>>> >> schemas
>>> >> >> match and find the sync
>>> >> >> >>> marker, or
>>> >> >> >>> * Trust that the schemas
>>> match and
>>> >> find the
>>> >> >> sync marker from the last
>>> >> >> >>> block
>>> >> >> >>>
>>> >> >> >>> DataFileWriter cannot refer
>>> to Hadoop
>>> >> classes
>>> >> >> such as FileSystem, but we
>>> >> >> >>> could add something to the
>>> mapred
>>> >> module that
>>> >> >> takes a Path and
>>> >> >> >>> FileSystem and returns
>>> something that
>>> >> >> implemements an interface that
>>> >> >> >>> DataFileWriter can append
>>> to.
>>> >> This would
>>> >> >> be something that is both a
>>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>>> >> >> >>> and an OutputStream, or has
>>> both an
>>> >> InputStream
>>> >> >> from the start of the
>>> >> >> >>> existing file and an
>>> OutputStream at
>>> >> the end.
>>> >> >> >>>
>>> >> >> >>>> Thanks,
>>> >> >> >>>> Vyacheslav
>>> >> >> >>>>
>>> >> >> >>>> On Feb 21, 2012, at 5:29
>>> AM, Harsh
>>> >> J
>>> >> >> wrote:
>>> >> >> >>>>
>>> >> >> >>>>> Hi,
>>> >> >> >>>>>
>>> >> >> >>>>> Use the appendTo
>>> feature of
>>> >> the
>>> >> >> DataFileWriter. See
>>> >> >> >>>>>
>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>> >> >> >>>>>
>>> >> >> >>>>> For a quick setup
>>> example,
>>> >> read also:
>>> >> >> >>>>>
>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>> >> >> >>>>>
>>> >> >> >>>>> On Tue, Feb 21, 2012
>>> at 3:15
>>> >> AM,
>>> >> >> Vyacheslav Zholudev
>>> >> >> >>>>> <vyacheslav.zholu...@gmail.com>
>>> >> >> wrote:
>>> >> >> >>>>>> Hi,
>>> >> >> >>>>>>
>>> >> >> >>>>>> is it possible to
>>> append
>>> >> to an
>>> >> >> already existing avro file when it was
>>> >> >> >>>>>> written and
>>> closed
>>> >> before?
>>> >> >> >>>>>>
>>> >> >> >>>>>> If I use
>>> >> >> >>>>>> outputStream =
>>> >> >> fs.append(avroFilePath);
>>> >> >> >>>>>>
>>> >> >> >>>>>> then later on I
>>> get:
>>> >> >> java.io.IOException: Invalid sync!
>>> >> >> >>>>>>
>>> >> >> >>>>>> Probably because
>>> the
>>> >> schema is
>>> >> >> written twice and some other issues.
>>> >> >> >>>>>>
>>> >> >> >>>>>> If I use
>>> outputStream =
>>> >> >> fs.create(avroFilePath); then the avro
>>> file
>>> >> >> >>>>>> gets
>>> >> >> >>>>>> overwritten.
>>> >> >> >>>>>>
>>> >> >> >>>>>> Thanks,
>>> >> >> >>>>>> Vyacheslav
>>> >> >> >>>>>
>>> >> >> >>>>> --
>>> >> >> >>>>> Harsh J
>>> >> >> >>>>> Customer Ops.
>>> Engineer
>>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>>> >> >> >
>>> >> >>
>>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>>> Malak
>>> >> <michaelma...@yahoo.com>
>>> >> >> wrote:
>>> >> >> > Was a JIRA ticket ever created
>>> regarding
>>> >> appending to
>>> >> >> an existing Avro file on HDFS?
>>> >> >> >
>>> >> >> > What is the status of such a
>>> capability, a
>>> >> year out
>>> >> >> from when the issue below was raised?
>>> >> >> >
>>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>>> >> "Vyacheslav
>>> >> >> Zholudev" <vyacheslav.zholu...@gmail.com>
>>> >> >> wrote:
>>> >> >> >
>>> >> >> >> Thanks for your reply, I
>>> suspected this.
>>> >> >> >>
>>> >> >> >> I will create a JIRA ticket.
>>> >> >> >>
>>> >> >> >> Vyacheslav
>>> >> >> >>
>>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>>> Scott Carey
>>> >> wrote:
>>> >> >> >>
>>> >> >> >>>
>>> >> >> >>> On 2/21/12 7:29 AM,
>>> "Vyacheslav
>>> >> Zholudev"
>>> >> >> <vyacheslav.zholu...@gmail.com>
>>> >> >> >>> wrote:
>>> >> >> >>>
>>> >> >> >>>> Yep, I saw that method as
>>> well as
>>> >> the
>>> >> >> stackoverflow post. However, I'm
>>> >> >> >>>> interested how to append
>>> to a file
>>> >> on the
>>> >> >> arbitrary file system, not
>>> >> >> >>>> only on the local one.
>>> >> >> >>>>
>>> >> >> >>>> I want to get an
>>> OutputStream
>>> >> based on the
>>> >> >> Path and the FileSystem
>>> >> >> >>>> implementation and then
>>> pass it
>>> >> for
>>> >> >> appending to avro methods.
>>> >> >> >>>>
>>> >> >> >>>> Is that possible?
>>> >> >> >>>
>>> >> >> >>> It is not possible without
>>> modifying
>>> >> >> DataFileWriter. Please open a JIRA
>>> >> >> >>> ticket.
>>> >> >> >>>
>>> >> >> >>> It could not simply append to
>>> an
>>> >> OutputStream,
>>> >> >> since it must either:
>>> >> >> >>> * Seek to the start to
>>> validate the
>>> >> schemas
>>> >> >> match and find the sync
>>> >> >> >>> marker, or
>>> >> >> >>> * Trust that the schemas
>>> match and
>>> >> find the
>>> >> >> sync marker from the last
>>> >> >> >>> block
>>> >> >> >>>
>>> >> >> >>> DataFileWriter cannot refer
>>> to Hadoop
>>> >> classes
>>> >> >> such as FileSystem, but we
>>> >> >> >>> could add something to the
>>> mapred
>>> >> module that
>>> >> >> takes a Path and
>>> >> >> >>> FileSystem and returns
>>> something that
>>> >> >> implemements an interface that
>>> >> >> >>> DataFileWriter can append
>>> to.
>>> >> This would
>>> >> >> be something that is both a
>>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>>> >> >> >>> and an OutputStream, or has
>>> both an
>>> >> InputStream
>>> >> >> from the start of the
>>> >> >> >>> existing file and an
>>> OutputStream at
>>> >> the end.
>>> >> >> >>>
>>> >> >> >>>> Thanks,
>>> >> >> >>>> Vyacheslav
>>> >> >> >>>>
>>> >> >> >>>> On Feb 21, 2012, at 5:29
>>> AM, Harsh
>>> >> J
>>> >> >> wrote:
>>> >> >> >>>>
>>> >> >> >>>>> Hi,
>>> >> >> >>>>>
>>> >> >> >>>>> Use the appendTo
>>> feature of
>>> >> the
>>> >> >> DataFileWriter. See
>>> >> >> >>>>>
>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>> >> >> >>>>>
>>> >> >> >>>>> For a quick setup
>>> example,
>>> >> read also:
>>> >> >> >>>>>
>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>> >> >> >>>>>
>>> >> >> >>>>> On Tue, Feb 21, 2012
>>> at 3:15
>>> >> AM,
>>> >> >> Vyacheslav Zholudev
>>> >> >> >>>>> <vyacheslav.zholu...@gmail.com>
>>> >> >> wrote:
>>> >> >> >>>>>> Hi,
>>> >> >> >>>>>>
>>> >> >> >>>>>> is it possible to
>>> append
>>> >> to an
>>> >> >> already existing avro file when it was
>>> >> >> >>>>>> written and
>>> closed
>>> >> before?
>>> >> >> >>>>>>
>>> >> >> >>>>>> If I use
>>> >> >> >>>>>> outputStream =
>>> >> >> fs.append(avroFilePath);
>>> >> >> >>>>>>
>>> >> >> >>>>>> then later on I
>>> get:
>>> >> >> java.io.IOException: Invalid sync!
>>> >> >> >>>>>>
>>> >> >> >>>>>> Probably because
>>> the
>>> >> schema is
>>> >> >> written twice and some other issues.
>>> >> >> >>>>>>
>>> >> >> >>>>>> If I use
>>> outputStream =
>>> >> >> fs.create(avroFilePath); then the avro
>>> file
>>> >> >> >>>>>> gets
>>> >> >> >>>>>> overwritten.
>>> >> >> >>>>>>
>>> >> >> >>>>>> Thanks,
>>> >> >> >>>>>> Vyacheslav
>>> >> >> >>>>>
>>> >> >> >>>>> --
>>> >> >> >>>>> Harsh J
>>> >> >> >>>>> Customer Ops.
>>> Engineer
>>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>>> >> >> >
>>> >> >>
>>> >>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <michaelma...@yahoo.com>
>>> wrote:
>>> > I don't believe a Hadoop FileSystem is a Java
>>> OutputStream?
>>> >
>>> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org>
>>> wrote:
>>> >
>>> >> From: Doug Cutting <cutt...@apache.org>
>>> >> Subject: Re: Is it possible to append to an already
>>> existing avro file
>>> >> To: user@avro.apache.org
>>> >> Date: Tuesday, February 5, 2013, 5:27 PM
>>> >> It will work on an OutputStream that
>>> >> supports append.
>>> >>
>>> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
>>> >> java.io.OutputStream)
>>> >>
>>> >> So it depends on how well HDFS implements
>>> >> FileSystem#append(), not on
>>> >> any changes in Avro.
>>> >>
>>> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
>>> >>
>>> >> I have no recent personal experience with append
>>> in
>>> >> HDFS.  Does anyone
>>> >> else here?
>>> >>
>>> >> Doug
>>> >>
>>> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak
>>> <michaelma...@yahoo.com>
>>> >> wrote:
>>> >> > My understanding is that will append to a file
>>> on the
>>> >> local filesystem, but not to a file on HDFS.
>>> >> >
>>> >> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org>
>>> >> wrote:
>>> >> >
>>> >> >> From: Doug Cutting <cutt...@apache.org>
>>> >> >> Subject: Re: Is it possible to append to
>>> an already
>>> >> existing avro file
>>> >> >> To: user@avro.apache.org
>>> >> >> Date: Tuesday, February 5, 2013, 5:08 PM
>>> >> >> The Jira is:
>>> >> >>
>>> >> >> https://issues.apache.org/jira/browse/AVRO-1035
>>> >> >>
>>> >> >> It is possible to append to an existing
>>> Avro file:
>>> >> >>
>>> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>> >> >>
>>> >> >> Should we close that issue as "fixed"?
>>> >> >>
>>> >> >> Doug
>>> >> >>
>>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>>> Malak
>>> >> <michaelma...@yahoo.com>
>>> >> >> wrote:
>>> >> >> > Was a JIRA ticket ever created
>>> regarding
>>> >> appending to
>>> >> >> an existing Avro file on HDFS?
>>> >> >> >
>>> >> >> > What is the status of such a
>>> capability, a
>>> >> year out
>>> >> >> from when the issue below was raised?
>>> >> >> >
>>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>>> >> "Vyacheslav
>>> >> >> Zholudev" <vyacheslav.zholu...@gmail.com>
>>> >> >> wrote:
>>> >> >> >
>>> >> >> >> Thanks for your reply, I
>>> suspected this.
>>> >> >> >>
>>> >> >> >> I will create a JIRA ticket.
>>> >> >> >>
>>> >> >> >> Vyacheslav
>>> >> >> >>
>>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>>> Scott Carey
>>> >> wrote:
>>> >> >> >>
>>> >> >> >>>
>>> >> >> >>> On 2/21/12 7:29 AM,
>>> "Vyacheslav
>>> >> Zholudev"
>>> >> >> <vyacheslav.zholu...@gmail.com>
>>> >> >> >>> wrote:
>>> >> >> >>>
>>> >> >> >>>> Yep, I saw that method as
>>> well as
>>> >> the
>>> >> >> stackoverflow post. However, I'm
>>> >> >> >>>> interested how to append
>>> to a file
>>> >> on the
>>> >> >> arbitrary file system, not
>>> >> >> >>>> only on the local one.
>>> >> >> >>>>
>>> >> >> >>>> I want to get an
>>> OutputStream
>>> >> based on the
>>> >> >> Path and the FileSystem
>>> >> >> >>>> implementation and then
>>> pass it
>>> >> for
>>> >> >> appending to avro methods.
>>> >> >> >>>>
>>> >> >> >>>> Is that possible?
>>> >> >> >>>
>>> >> >> >>> It is not possible without
>>> modifying
>>> >> >> DataFileWriter. Please open a JIRA
>>> >> >> >>> ticket.
>>> >> >> >>>
>>> >> >> >>> It could not simply append to
>>> an
>>> >> OutputStream,
>>> >> >> since it must either:
>>> >> >> >>> * Seek to the start to
>>> validate the
>>> >> schemas
>>> >> >> match and find the sync
>>> >> >> >>> marker, or
>>> >> >> >>> * Trust that the schemas
>>> match and
>>> >> find the
>>> >> >> sync marker from the last
>>> >> >> >>> block
>>> >> >> >>>
>>> >> >> >>> DataFileWriter cannot refer
>>> to Hadoop
>>> >> classes
>>> >> >> such as FileSystem, but we
>>> >> >> >>> could add something to the
>>> mapred
>>> >> module that
>>> >> >> takes a Path and
>>> >> >> >>> FileSystem and returns
>>> something that
>>> >> >> implemements an interface that
>>> >> >> >>> DataFileWriter can append
>>> to.
>>> >> This would
>>> >> >> be something that is both a
>>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>>> >> >> >>> and an OutputStream, or has
>>> both an
>>> >> InputStream
>>> >> >> from the start of the
>>> >> >> >>> existing file and an
>>> OutputStream at
>>> >> the end.
>>> >> >> >>>
>>> >> >> >>>> Thanks,
>>> >> >> >>>> Vyacheslav
>>> >> >> >>>>
>>> >> >> >>>> On Feb 21, 2012, at 5:29
>>> AM, Harsh
>>> >> J
>>> >> >> wrote:
>>> >> >> >>>>
>>> >> >> >>>>> Hi,
>>> >> >> >>>>>
>>> >> >> >>>>> Use the appendTo
>>> feature of
>>> >> the
>>> >> >> DataFileWriter. See
>>> >> >> >>>>>
>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>> >> >> >>>>>
>>> >> >> >>>>> For a quick setup
>>> example,
>>> >> read also:
>>> >> >> >>>>>
>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>> >> >> >>>>>
>>> >> >> >>>>> On Tue, Feb 21, 2012
>>> at 3:15
>>> >> AM,
>>> >> >> Vyacheslav Zholudev
>>> >> >> >>>>> <vyacheslav.zholu...@gmail.com>
>>> >> >> wrote:
>>> >> >> >>>>>> Hi,
>>> >> >> >>>>>>
>>> >> >> >>>>>> is it possible to
>>> append
>>> >> to an
>>> >> >> already existing avro file when it was
>>> >> >> >>>>>> written and
>>> closed
>>> >> before?
>>> >> >> >>>>>>
>>> >> >> >>>>>> If I use
>>> >> >> >>>>>> outputStream =
>>> >> >> fs.append(avroFilePath);
>>> >> >> >>>>>>
>>> >> >> >>>>>> then later on I
>>> get:
>>> >> >> java.io.IOException: Invalid sync!
>>> >> >> >>>>>>
>>> >> >> >>>>>> Probably because
>>> the
>>> >> schema is
>>> >> >> written twice and some other issues.
>>> >> >> >>>>>>
>>> >> >> >>>>>> If I use
>>> outputStream =
>>> >> >> fs.create(avroFilePath); then the avro
>>> file
>>> >> >> >>>>>> gets
>>> >> >> >>>>>> overwritten.
>>> >> >> >>>>>>
>>> >> >> >>>>>> Thanks,
>>> >> >> >>>>>> Vyacheslav
>>> >> >> >>>>>
>>> >> >> >>>>> --
>>> >> >> >>>>> Harsh J
>>> >> >> >>>>> Customer Ops.
>>> Engineer
>>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>>> >> >> >
>>> >> >>
>>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>>> Malak
>>> >> <michaelma...@yahoo.com>
>>> >> >> wrote:
>>> >> >> > Was a JIRA ticket ever created
>>> regarding
>>> >> appending to
>>> >> >> an existing Avro file on HDFS?
>>> >> >> >
>>> >> >> > What is the status of such a
>>> capability, a
>>> >> year out
>>> >> >> from when the issue below was raised?
>>> >> >> >
>>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>>> >> "Vyacheslav
>>> >> >> Zholudev" <vyacheslav.zholu...@gmail.com>
>>> >> >> wrote:
>>> >> >> >
>>> >> >> >> Thanks for your reply, I
>>> suspected this.
>>> >> >> >>
>>> >> >> >> I will create a JIRA ticket.
>>> >> >> >>
>>> >> >> >> Vyacheslav
>>> >> >> >>
>>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>>> Scott Carey
>>> >> wrote:
>>> >> >> >>
>>> >> >> >>>
>>> >> >> >>> On 2/21/12 7:29 AM,
>>> "Vyacheslav
>>> >> Zholudev"
>>> >> >> <vyacheslav.zholu...@gmail.com>
>>> >> >> >>> wrote:
>>> >> >> >>>
>>> >> >> >>>> Yep, I saw that method as
>>> well as
>>> >> the
>>> >> >> stackoverflow post. However, I'm
>>> >> >> >>>> interested how to append
>>> to a file
>>> >> on the
>>> >> >> arbitrary file system, not
>>> >> >> >>>> only on the local one.
>>> >> >> >>>>
>>> >> >> >>>> I want to get an
>>> OutputStream
>>> >> based on the
>>> >> >> Path and the FileSystem
>>> >> >> >>>> implementation and then
>>> pass it
>>> >> for
>>> >> >> appending to avro methods.
>>> >> >> >>>>
>>> >> >> >>>> Is that possible?
>>> >> >> >>>
>>> >> >> >>> It is not possible without
>>> modifying
>>> >> >> DataFileWriter. Please open a JIRA
>>> >> >> >>> ticket.
>>> >> >> >>>
>>> >> >> >>> It could not simply append to
>>> an
>>> >> OutputStream,
>>> >> >> since it must either:
>>> >> >> >>> * Seek to the start to
>>> validate the
>>> >> schemas
>>> >> >> match and find the sync
>>> >> >> >>> marker, or
>>> >> >> >>> * Trust that the schemas
>>> match and
>>> >> find the
>>> >> >> sync marker from the last
>>> >> >> >>> block
>>> >> >> >>>
>>> >> >> >>> DataFileWriter cannot refer
>>> to Hadoop
>>> >> classes
>>> >> >> such as FileSystem, but we
>>> >> >> >>> could add something to the
>>> mapred
>>> >> module that
>>> >> >> takes a Path and
>>> >> >> >>> FileSystem and returns
>>> something that
>>> >> >> implemements an interface that
>>> >> >> >>> DataFileWriter can append
>>> to.
>>> >> This would
>>> >> >> be something that is both a
>>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>>> >> >> >>> and an OutputStream, or has
>>> both an
>>> >> InputStream
>>> >> >> from the start of the
>>> >> >> >>> existing file and an
>>> OutputStream at
>>> >> the end.
>>> >> >> >>>
>>> >> >> >>>> Thanks,
>>> >> >> >>>> Vyacheslav
>>> >> >> >>>>
>>> >> >> >>>> On Feb 21, 2012, at 5:29
>>> AM, Harsh
>>> >> J
>>> >> >> wrote:
>>> >> >> >>>>
>>> >> >> >>>>> Hi,
>>> >> >> >>>>>
>>> >> >> >>>>> Use the appendTo
>>> feature of
>>> >> the
>>> >> >> DataFileWriter. See
>>> >> >> >>>>>
>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>> >> >> >>>>>
>>> >> >> >>>>> For a quick setup
>>> example,
>>> >> read also:
>>> >> >> >>>>>
>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>> >> >> >>>>>
>>> >> >> >>>>> On Tue, Feb 21, 2012
>>> at 3:15
>>> >> AM,
>>> >> >> Vyacheslav Zholudev
>>> >> >> >>>>> <vyacheslav.zholu...@gmail.com>
>>> >> >> wrote:
>>> >> >> >>>>>> Hi,
>>> >> >> >>>>>>
>>> >> >> >>>>>> is it possible to
>>> append
>>> >> to an
>>> >> >> already existing avro file when it was
>>> >> >> >>>>>> written and
>>> closed
>>> >> before?
>>> >> >> >>>>>>
>>> >> >> >>>>>> If I use
>>> >> >> >>>>>> outputStream =
>>> >> >> fs.append(avroFilePath);
>>> >> >> >>>>>>
>>> >> >> >>>>>> then later on I
>>> get:
>>> >> >> java.io.IOException: Invalid sync!
>>> >> >> >>>>>>
>>> >> >> >>>>>> Probably because
>>> the
>>> >> schema is
>>> >> >> written twice and some other issues.
>>> >> >> >>>>>>
>>> >> >> >>>>>> If I use
>>> outputStream =
>>> >> >> fs.create(avroFilePath); then the avro
>>> file
>>> >> >> >>>>>> gets
>>> >> >> >>>>>> overwritten.
>>> >> >> >>>>>>
>>> >> >> >>>>>> Thanks,
>>> >> >> >>>>>> Vyacheslav
>>> >> >> >>>>>
>>> >> >> >>>>> --
>>> >> >> >>>>> Harsh J
>>> >> >> >>>>> Customer Ops.
>>> Engineer
>>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>>> >> >> >
>>> >> >>
>>> >>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>> On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <michaelma...@yahoo.com>
>>> wrote:
>>> > I don't believe a Hadoop FileSystem is a Java
>>> OutputStream?
>>> >
>>> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org>
>>> wrote:
>>> >
>>> >> From: Doug Cutting <cutt...@apache.org>
>>> >> Subject: Re: Is it possible to append to an already
>>> existing avro file
>>> >> To: user@avro.apache.org
>>> >> Date: Tuesday, February 5, 2013, 5:27 PM
>>> >> It will work on an OutputStream that
>>> >> supports append.
>>> >>
>>> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput,
>>> >> java.io.OutputStream)
>>> >>
>>> >> So it depends on how well HDFS implements
>>> >> FileSystem#append(), not on
>>> >> any changes in Avro.
>>> >>
>>> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path)
>>> >>
>>> >> I have no recent personal experience with append
>>> in
>>> >> HDFS.  Does anyone
>>> >> else here?
>>> >>
>>> >> Doug
>>> >>
>>> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak
>>> <michaelma...@yahoo.com>
>>> >> wrote:
>>> >> > My understanding is that will append to a file
>>> on the
>>> >> local filesystem, but not to a file on HDFS.
>>> >> >
>>> >> > --- On Tue, 2/5/13, Doug Cutting <cutt...@apache.org>
>>> >> wrote:
>>> >> >
>>> >> >> From: Doug Cutting <cutt...@apache.org>
>>> >> >> Subject: Re: Is it possible to append to
>>> an already
>>> >> existing avro file
>>> >> >> To: user@avro.apache.org
>>> >> >> Date: Tuesday, February 5, 2013, 5:08 PM
>>> >> >> The Jira is:
>>> >> >>
>>> >> >> https://issues.apache.org/jira/browse/AVRO-1035
>>> >> >>
>>> >> >> It is possible to append to an existing
>>> Avro file:
>>> >> >>
>>> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>> >> >>
>>> >> >> Should we close that issue as "fixed"?
>>> >> >>
>>> >> >> Doug
>>> >> >>
>>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>>> Malak
>>> >> <michaelma...@yahoo.com>
>>> >> >> wrote:
>>> >> >> > Was a JIRA ticket ever created
>>> regarding
>>> >> appending to
>>> >> >> an existing Avro file on HDFS?
>>> >> >> >
>>> >> >> > What is the status of such a
>>> capability, a
>>> >> year out
>>> >> >> from when the issue below was raised?
>>> >> >> >
>>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>>> >> "Vyacheslav
>>> >> >> Zholudev" <vyacheslav.zholu...@gmail.com>
>>> >> >> wrote:
>>> >> >> >
>>> >> >> >> Thanks for your reply, I
>>> suspected this.
>>> >> >> >>
>>> >> >> >> I will create a JIRA ticket.
>>> >> >> >>
>>> >> >> >> Vyacheslav
>>> >> >> >>
>>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>>> Scott Carey
>>> >> wrote:
>>> >> >> >>
>>> >> >> >>>
>>> >> >> >>> On 2/21/12 7:29 AM,
>>> "Vyacheslav
>>> >> Zholudev"
>>> >> >> <vyacheslav.zholu...@gmail.com>
>>> >> >> >>> wrote:
>>> >> >> >>>
>>> >> >> >>>> Yep, I saw that method as
>>> well as
>>> >> the
>>> >> >> stackoverflow post. However, I'm
>>> >> >> >>>> interested how to append
>>> to a file
>>> >> on the
>>> >> >> arbitrary file system, not
>>> >> >> >>>> only on the local one.
>>> >> >> >>>>
>>> >> >> >>>> I want to get an
>>> OutputStream
>>> >> based on the
>>> >> >> Path and the FileSystem
>>> >> >> >>>> implementation and then
>>> pass it
>>> >> for
>>> >> >> appending to avro methods.
>>> >> >> >>>>
>>> >> >> >>>> Is that possible?
>>> >> >> >>>
>>> >> >> >>> It is not possible without
>>> modifying
>>> >> >> DataFileWriter. Please open a JIRA
>>> >> >> >>> ticket.
>>> >> >> >>>
>>> >> >> >>> It could not simply append to
>>> an
>>> >> OutputStream,
>>> >> >> since it must either:
>>> >> >> >>> * Seek to the start to
>>> validate the
>>> >> schemas
>>> >> >> match and find the sync
>>> >> >> >>> marker, or
>>> >> >> >>> * Trust that the schemas
>>> match and
>>> >> find the
>>> >> >> sync marker from the last
>>> >> >> >>> block
>>> >> >> >>>
>>> >> >> >>> DataFileWriter cannot refer
>>> to Hadoop
>>> >> classes
>>> >> >> such as FileSystem, but we
>>> >> >> >>> could add something to the
>>> mapred
>>> >> module that
>>> >> >> takes a Path and
>>> >> >> >>> FileSystem and returns
>>> something that
>>> >> >> implemements an interface that
>>> >> >> >>> DataFileWriter can append
>>> to.
>>> >> This would
>>> >> >> be something that is both a
>>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>>> >> >> >>> and an OutputStream, or has
>>> both an
>>> >> InputStream
>>> >> >> from the start of the
>>> >> >> >>> existing file and an
>>> OutputStream at
>>> >> the end.
>>> >> >> >>>
>>> >> >> >>>> Thanks,
>>> >> >> >>>> Vyacheslav
>>> >> >> >>>>
>>> >> >> >>>> On Feb 21, 2012, at 5:29
>>> AM, Harsh
>>> >> J
>>> >> >> wrote:
>>> >> >> >>>>
>>> >> >> >>>>> Hi,
>>> >> >> >>>>>
>>> >> >> >>>>> Use the appendTo
>>> feature of
>>> >> the
>>> >> >> DataFileWriter. See
>>> >> >> >>>>>
>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>> >> >> >>>>>
>>> >> >> >>>>> For a quick setup
>>> example,
>>> >> read also:
>>> >> >> >>>>>
>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>> >> >> >>>>>
>>> >> >> >>>>> On Tue, Feb 21, 2012
>>> at 3:15
>>> >> AM,
>>> >> >> Vyacheslav Zholudev
>>> >> >> >>>>> <vyacheslav.zholu...@gmail.com>
>>> >> >> wrote:
>>> >> >> >>>>>> Hi,
>>> >> >> >>>>>>
>>> >> >> >>>>>> is it possible to
>>> append
>>> >> to an
>>> >> >> already existing avro file when it was
>>> >> >> >>>>>> written and
>>> closed
>>> >> before?
>>> >> >> >>>>>>
>>> >> >> >>>>>> If I use
>>> >> >> >>>>>> outputStream =
>>> >> >> fs.append(avroFilePath);
>>> >> >> >>>>>>
>>> >> >> >>>>>> then later on I
>>> get:
>>> >> >> java.io.IOException: Invalid sync!
>>> >> >> >>>>>>
>>> >> >> >>>>>> Probably because
>>> the
>>> >> schema is
>>> >> >> written twice and some other issues.
>>> >> >> >>>>>>
>>> >> >> >>>>>> If I use
>>> outputStream =
>>> >> >> fs.create(avroFilePath); then the avro
>>> file
>>> >> >> >>>>>> gets
>>> >> >> >>>>>> overwritten.
>>> >> >> >>>>>>
>>> >> >> >>>>>> Thanks,
>>> >> >> >>>>>> Vyacheslav
>>> >> >> >>>>>
>>> >> >> >>>>> --
>>> >> >> >>>>> Harsh J
>>> >> >> >>>>> Customer Ops.
>>> Engineer
>>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>>> >> >> >
>>> >> >>
>>> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael
>>> Malak
>>> >> <michaelma...@yahoo.com>
>>> >> >> wrote:
>>> >> >> > Was a JIRA ticket ever created
>>> regarding
>>> >> appending to
>>> >> >> an existing Avro file on HDFS?
>>> >> >> >
>>> >> >> > What is the status of such a
>>> capability, a
>>> >> year out
>>> >> >> from when the issue below was raised?
>>> >> >> >
>>> >> >> > On Wed, 22 Feb 2012 10:57:48 +0100,
>>> >> "Vyacheslav
>>> >> >> Zholudev" <vyacheslav.zholu...@gmail.com>
>>> >> >> wrote:
>>> >> >> >
>>> >> >> >> Thanks for your reply, I
>>> suspected this.
>>> >> >> >>
>>> >> >> >> I will create a JIRA ticket.
>>> >> >> >>
>>> >> >> >> Vyacheslav
>>> >> >> >>
>>> >> >> >> On Feb 21, 2012, at 6:02 PM,
>>> Scott Carey
>>> >> wrote:
>>> >> >> >>
>>> >> >> >>>
>>> >> >> >>> On 2/21/12 7:29 AM,
>>> "Vyacheslav
>>> >> Zholudev"
>>> >> >> <vyacheslav.zholu...@gmail.com>
>>> >> >> >>> wrote:
>>> >> >> >>>
>>> >> >> >>>> Yep, I saw that method as
>>> well as
>>> >> the
>>> >> >> stackoverflow post. However, I'm
>>> >> >> >>>> interested how to append
>>> to a file
>>> >> on the
>>> >> >> arbitrary file system, not
>>> >> >> >>>> only on the local one.
>>> >> >> >>>>
>>> >> >> >>>> I want to get an
>>> OutputStream
>>> >> based on the
>>> >> >> Path and the FileSystem
>>> >> >> >>>> implementation and then
>>> pass it
>>> >> for
>>> >> >> appending to avro methods.
>>> >> >> >>>>
>>> >> >> >>>> Is that possible?
>>> >> >> >>>
>>> >> >> >>> It is not possible without
>>> modifying
>>> >> >> DataFileWriter. Please open a JIRA
>>> >> >> >>> ticket.
>>> >> >> >>>
>>> >> >> >>> It could not simply append to
>>> an
>>> >> OutputStream,
>>> >> >> since it must either:
>>> >> >> >>> * Seek to the start to
>>> validate the
>>> >> schemas
>>> >> >> match and find the sync
>>> >> >> >>> marker, or
>>> >> >> >>> * Trust that the schemas
>>> match and
>>> >> find the
>>> >> >> sync marker from the last
>>> >> >> >>> block
>>> >> >> >>>
>>> >> >> >>> DataFileWriter cannot refer
>>> to Hadoop
>>> >> classes
>>> >> >> such as FileSystem, but we
>>> >> >> >>> could add something to the
>>> mapred
>>> >> module that
>>> >> >> takes a Path and
>>> >> >> >>> FileSystem and returns
>>> something that
>>> >> >> implemements an interface that
>>> >> >> >>> DataFileWriter can append
>>> to.
>>> >> This would
>>> >> >> be something that is both a
>>> >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html
>>> >> >> >>> and an OutputStream, or has
>>> both an
>>> >> InputStream
>>> >> >> from the start of the
>>> >> >> >>> existing file and an
>>> OutputStream at
>>> >> the end.
>>> >> >> >>>
>>> >> >> >>>> Thanks,
>>> >> >> >>>> Vyacheslav
>>> >> >> >>>>
>>> >> >> >>>> On Feb 21, 2012, at 5:29
>>> AM, Harsh
>>> >> J
>>> >> >> wrote:
>>> >> >> >>>>
>>> >> >> >>>>> Hi,
>>> >> >> >>>>>
>>> >> >> >>>>> Use the appendTo
>>> feature of
>>> >> the
>>> >> >> DataFileWriter. See
>>> >> >> >>>>>
>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)
>>> >> >> >>>>>
>>> >> >> >>>>> For a quick setup
>>> example,
>>> >> read also:
>>> >> >> >>>>>
>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file
>>> >> >> >>>>>
>>> >> >> >>>>> On Tue, Feb 21, 2012
>>> at 3:15
>>> >> AM,
>>> >> >> Vyacheslav Zholudev
>>> >> >> >>>>> <vyacheslav.zholu...@gmail.com>
>>> >> >> wrote:
>>> >> >> >>>>>> Hi,
>>> >> >> >>>>>>
>>> >> >> >>>>>> is it possible to
>>> append
>>> >> to an
>>> >> >> already existing avro file when it was
>>> >> >> >>>>>> written and
>>> closed
>>> >> before?
>>> >> >> >>>>>>
>>> >> >> >>>>>> If I use
>>> >> >> >>>>>> outputStream =
>>> >> >> fs.append(avroFilePath);
>>> >> >> >>>>>>
>>> >> >> >>>>>> then later on I
>>> get:
>>> >> >> java.io.IOException: Invalid sync!
>>> >> >> >>>>>>
>>> >> >> >>>>>> Probably because
>>> the
>>> >> schema is
>>> >> >> written twice and some other issues.
>>> >> >> >>>>>>
>>> >> >> >>>>>> If I use
>>> outputStream =
>>> >> >> fs.create(avroFilePath); then the avro
>>> file
>>> >> >> >>>>>> gets
>>> >> >> >>>>>> overwritten.
>>> >> >> >>>>>>
>>> >> >> >>>>>> Thanks,
>>> >> >> >>>>>> Vyacheslav
>>> >> >> >>>>>
>>> >> >> >>>>> --
>>> >> >> >>>>> Harsh J
>>> >> >> >>>>> Customer Ops.
>>> Engineer
>>> >> >> >>>>> Cloudera | http://tiny.cloudera.com/about
>>> >> >> >
>>> >> >>
>>> >>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>
>
>
> --
> Harsh J

Reply via email to