Hey Michael, It does implement the regular Java OutputStream interface, as seen in the API: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataOutputStream.html.
Here's a sample program that works on Hadoop 2.x in my tests: https://gist.github.com/QwertyManiac/4724582 On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <[email protected]> wrote: > I don't believe a Hadoop FileSystem is a Java OutputStream? > > --- On Tue, 2/5/13, Doug Cutting <[email protected]> wrote: > >> From: Doug Cutting <[email protected]> >> Subject: Re: Is it possible to append to an already existing avro file >> To: [email protected] >> Date: Tuesday, February 5, 2013, 5:27 PM >> It will work on an OutputStream that >> supports append. >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput, >> java.io.OutputStream) >> >> So it depends on how well HDFS implements >> FileSystem#append(), not on >> any changes in Avro. >> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path) >> >> I have no recent personal experience with append in >> HDFS. Does anyone >> else here? >> >> Doug >> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak <[email protected]> >> wrote: >> > My understanding is that will append to a file on the >> local filesystem, but not to a file on HDFS. >> > >> > --- On Tue, 2/5/13, Doug Cutting <[email protected]> >> wrote: >> > >> >> From: Doug Cutting <[email protected]> >> >> Subject: Re: Is it possible to append to an already >> existing avro file >> >> To: [email protected] >> >> Date: Tuesday, February 5, 2013, 5:08 PM >> >> The Jira is: >> >> >> >> https://issues.apache.org/jira/browse/AVRO-1035 >> >> >> >> It is possible to append to an existing Avro file: >> >> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >> >> >> >> Should we close that issue as "fixed"? >> >> >> >> Doug >> >> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak >> <[email protected]> >> >> wrote: >> >> > Was a JIRA ticket ever created regarding >> appending to >> >> an existing Avro file on HDFS? >> >> > >> >> > What is the status of such a capability, a >> year out >> >> from when the issue below was raised? >> >> > >> >> > On Wed, 22 Feb 2012 10:57:48 +0100, >> "Vyacheslav >> >> Zholudev" <[email protected]> >> >> wrote: >> >> > >> >> >> Thanks for your reply, I suspected this. >> >> >> >> >> >> I will create a JIRA ticket. >> >> >> >> >> >> Vyacheslav >> >> >> >> >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey >> wrote: >> >> >> >> >> >>> >> >> >>> On 2/21/12 7:29 AM, "Vyacheslav >> Zholudev" >> >> <[email protected]> >> >> >>> wrote: >> >> >>> >> >> >>>> Yep, I saw that method as well as >> the >> >> stackoverflow post. However, I'm >> >> >>>> interested how to append to a file >> on the >> >> arbitrary file system, not >> >> >>>> only on the local one. >> >> >>>> >> >> >>>> I want to get an OutputStream >> based on the >> >> Path and the FileSystem >> >> >>>> implementation and then pass it >> for >> >> appending to avro methods. >> >> >>>> >> >> >>>> Is that possible? >> >> >>> >> >> >>> It is not possible without modifying >> >> DataFileWriter. Please open a JIRA >> >> >>> ticket. >> >> >>> >> >> >>> It could not simply append to an >> OutputStream, >> >> since it must either: >> >> >>> * Seek to the start to validate the >> schemas >> >> match and find the sync >> >> >>> marker, or >> >> >>> * Trust that the schemas match and >> find the >> >> sync marker from the last >> >> >>> block >> >> >>> >> >> >>> DataFileWriter cannot refer to Hadoop >> classes >> >> such as FileSystem, but we >> >> >>> could add something to the mapred >> module that >> >> takes a Path and >> >> >>> FileSystem and returns something that >> >> implemements an interface that >> >> >>> DataFileWriter can append to. >> This would >> >> be something that is both a >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html >> >> >>> and an OutputStream, or has both an >> InputStream >> >> from the start of the >> >> >>> existing file and an OutputStream at >> the end. >> >> >>> >> >> >>>> Thanks, >> >> >>>> Vyacheslav >> >> >>>> >> >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh >> J >> >> wrote: >> >> >>>> >> >> >>>>> Hi, >> >> >>>>> >> >> >>>>> Use the appendTo feature of >> the >> >> DataFileWriter. See >> >> >>>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >> >> >>>>> >> >> >>>>> For a quick setup example, >> read also: >> >> >>>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file >> >> >>>>> >> >> >>>>> On Tue, Feb 21, 2012 at 3:15 >> AM, >> >> Vyacheslav Zholudev >> >> >>>>> <[email protected]> >> >> wrote: >> >> >>>>>> Hi, >> >> >>>>>> >> >> >>>>>> is it possible to append >> to an >> >> already existing avro file when it was >> >> >>>>>> written and closed >> before? >> >> >>>>>> >> >> >>>>>> If I use >> >> >>>>>> outputStream = >> >> fs.append(avroFilePath); >> >> >>>>>> >> >> >>>>>> then later on I get: >> >> java.io.IOException: Invalid sync! >> >> >>>>>> >> >> >>>>>> Probably because the >> schema is >> >> written twice and some other issues. >> >> >>>>>> >> >> >>>>>> If I use outputStream = >> >> fs.create(avroFilePath); then the avro file >> >> >>>>>> gets >> >> >>>>>> overwritten. >> >> >>>>>> >> >> >>>>>> Thanks, >> >> >>>>>> Vyacheslav >> >> >>>>> >> >> >>>>> -- >> >> >>>>> Harsh J >> >> >>>>> Customer Ops. Engineer >> >> >>>>> Cloudera | http://tiny.cloudera.com/about >> >> > >> >> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak >> <[email protected]> >> >> wrote: >> >> > Was a JIRA ticket ever created regarding >> appending to >> >> an existing Avro file on HDFS? >> >> > >> >> > What is the status of such a capability, a >> year out >> >> from when the issue below was raised? >> >> > >> >> > On Wed, 22 Feb 2012 10:57:48 +0100, >> "Vyacheslav >> >> Zholudev" <[email protected]> >> >> wrote: >> >> > >> >> >> Thanks for your reply, I suspected this. >> >> >> >> >> >> I will create a JIRA ticket. >> >> >> >> >> >> Vyacheslav >> >> >> >> >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey >> wrote: >> >> >> >> >> >>> >> >> >>> On 2/21/12 7:29 AM, "Vyacheslav >> Zholudev" >> >> <[email protected]> >> >> >>> wrote: >> >> >>> >> >> >>>> Yep, I saw that method as well as >> the >> >> stackoverflow post. However, I'm >> >> >>>> interested how to append to a file >> on the >> >> arbitrary file system, not >> >> >>>> only on the local one. >> >> >>>> >> >> >>>> I want to get an OutputStream >> based on the >> >> Path and the FileSystem >> >> >>>> implementation and then pass it >> for >> >> appending to avro methods. >> >> >>>> >> >> >>>> Is that possible? >> >> >>> >> >> >>> It is not possible without modifying >> >> DataFileWriter. Please open a JIRA >> >> >>> ticket. >> >> >>> >> >> >>> It could not simply append to an >> OutputStream, >> >> since it must either: >> >> >>> * Seek to the start to validate the >> schemas >> >> match and find the sync >> >> >>> marker, or >> >> >>> * Trust that the schemas match and >> find the >> >> sync marker from the last >> >> >>> block >> >> >>> >> >> >>> DataFileWriter cannot refer to Hadoop >> classes >> >> such as FileSystem, but we >> >> >>> could add something to the mapred >> module that >> >> takes a Path and >> >> >>> FileSystem and returns something that >> >> implemements an interface that >> >> >>> DataFileWriter can append to. >> This would >> >> be something that is both a >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html >> >> >>> and an OutputStream, or has both an >> InputStream >> >> from the start of the >> >> >>> existing file and an OutputStream at >> the end. >> >> >>> >> >> >>>> Thanks, >> >> >>>> Vyacheslav >> >> >>>> >> >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh >> J >> >> wrote: >> >> >>>> >> >> >>>>> Hi, >> >> >>>>> >> >> >>>>> Use the appendTo feature of >> the >> >> DataFileWriter. See >> >> >>>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >> >> >>>>> >> >> >>>>> For a quick setup example, >> read also: >> >> >>>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file >> >> >>>>> >> >> >>>>> On Tue, Feb 21, 2012 at 3:15 >> AM, >> >> Vyacheslav Zholudev >> >> >>>>> <[email protected]> >> >> wrote: >> >> >>>>>> Hi, >> >> >>>>>> >> >> >>>>>> is it possible to append >> to an >> >> already existing avro file when it was >> >> >>>>>> written and closed >> before? >> >> >>>>>> >> >> >>>>>> If I use >> >> >>>>>> outputStream = >> >> fs.append(avroFilePath); >> >> >>>>>> >> >> >>>>>> then later on I get: >> >> java.io.IOException: Invalid sync! >> >> >>>>>> >> >> >>>>>> Probably because the >> schema is >> >> written twice and some other issues. >> >> >>>>>> >> >> >>>>>> If I use outputStream = >> >> fs.create(avroFilePath); then the avro file >> >> >>>>>> gets >> >> >>>>>> overwritten. >> >> >>>>>> >> >> >>>>>> Thanks, >> >> >>>>>> Vyacheslav >> >> >>>>> >> >> >>>>> -- >> >> >>>>> Harsh J >> >> >>>>> Customer Ops. Engineer >> >> >>>>> Cloudera | http://tiny.cloudera.com/about >> >> > >> >> >> -- Harsh J On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <[email protected]> wrote: > I don't believe a Hadoop FileSystem is a Java OutputStream? > > --- On Tue, 2/5/13, Doug Cutting <[email protected]> wrote: > >> From: Doug Cutting <[email protected]> >> Subject: Re: Is it possible to append to an already existing avro file >> To: [email protected] >> Date: Tuesday, February 5, 2013, 5:27 PM >> It will work on an OutputStream that >> supports append. >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput, >> java.io.OutputStream) >> >> So it depends on how well HDFS implements >> FileSystem#append(), not on >> any changes in Avro. >> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path) >> >> I have no recent personal experience with append in >> HDFS. Does anyone >> else here? >> >> Doug >> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak <[email protected]> >> wrote: >> > My understanding is that will append to a file on the >> local filesystem, but not to a file on HDFS. >> > >> > --- On Tue, 2/5/13, Doug Cutting <[email protected]> >> wrote: >> > >> >> From: Doug Cutting <[email protected]> >> >> Subject: Re: Is it possible to append to an already >> existing avro file >> >> To: [email protected] >> >> Date: Tuesday, February 5, 2013, 5:08 PM >> >> The Jira is: >> >> >> >> https://issues.apache.org/jira/browse/AVRO-1035 >> >> >> >> It is possible to append to an existing Avro file: >> >> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >> >> >> >> Should we close that issue as "fixed"? >> >> >> >> Doug >> >> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak >> <[email protected]> >> >> wrote: >> >> > Was a JIRA ticket ever created regarding >> appending to >> >> an existing Avro file on HDFS? >> >> > >> >> > What is the status of such a capability, a >> year out >> >> from when the issue below was raised? >> >> > >> >> > On Wed, 22 Feb 2012 10:57:48 +0100, >> "Vyacheslav >> >> Zholudev" <[email protected]> >> >> wrote: >> >> > >> >> >> Thanks for your reply, I suspected this. >> >> >> >> >> >> I will create a JIRA ticket. >> >> >> >> >> >> Vyacheslav >> >> >> >> >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey >> wrote: >> >> >> >> >> >>> >> >> >>> On 2/21/12 7:29 AM, "Vyacheslav >> Zholudev" >> >> <[email protected]> >> >> >>> wrote: >> >> >>> >> >> >>>> Yep, I saw that method as well as >> the >> >> stackoverflow post. However, I'm >> >> >>>> interested how to append to a file >> on the >> >> arbitrary file system, not >> >> >>>> only on the local one. >> >> >>>> >> >> >>>> I want to get an OutputStream >> based on the >> >> Path and the FileSystem >> >> >>>> implementation and then pass it >> for >> >> appending to avro methods. >> >> >>>> >> >> >>>> Is that possible? >> >> >>> >> >> >>> It is not possible without modifying >> >> DataFileWriter. Please open a JIRA >> >> >>> ticket. >> >> >>> >> >> >>> It could not simply append to an >> OutputStream, >> >> since it must either: >> >> >>> * Seek to the start to validate the >> schemas >> >> match and find the sync >> >> >>> marker, or >> >> >>> * Trust that the schemas match and >> find the >> >> sync marker from the last >> >> >>> block >> >> >>> >> >> >>> DataFileWriter cannot refer to Hadoop >> classes >> >> such as FileSystem, but we >> >> >>> could add something to the mapred >> module that >> >> takes a Path and >> >> >>> FileSystem and returns something that >> >> implemements an interface that >> >> >>> DataFileWriter can append to. >> This would >> >> be something that is both a >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html >> >> >>> and an OutputStream, or has both an >> InputStream >> >> from the start of the >> >> >>> existing file and an OutputStream at >> the end. >> >> >>> >> >> >>>> Thanks, >> >> >>>> Vyacheslav >> >> >>>> >> >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh >> J >> >> wrote: >> >> >>>> >> >> >>>>> Hi, >> >> >>>>> >> >> >>>>> Use the appendTo feature of >> the >> >> DataFileWriter. See >> >> >>>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >> >> >>>>> >> >> >>>>> For a quick setup example, >> read also: >> >> >>>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file >> >> >>>>> >> >> >>>>> On Tue, Feb 21, 2012 at 3:15 >> AM, >> >> Vyacheslav Zholudev >> >> >>>>> <[email protected]> >> >> wrote: >> >> >>>>>> Hi, >> >> >>>>>> >> >> >>>>>> is it possible to append >> to an >> >> already existing avro file when it was >> >> >>>>>> written and closed >> before? >> >> >>>>>> >> >> >>>>>> If I use >> >> >>>>>> outputStream = >> >> fs.append(avroFilePath); >> >> >>>>>> >> >> >>>>>> then later on I get: >> >> java.io.IOException: Invalid sync! >> >> >>>>>> >> >> >>>>>> Probably because the >> schema is >> >> written twice and some other issues. >> >> >>>>>> >> >> >>>>>> If I use outputStream = >> >> fs.create(avroFilePath); then the avro file >> >> >>>>>> gets >> >> >>>>>> overwritten. >> >> >>>>>> >> >> >>>>>> Thanks, >> >> >>>>>> Vyacheslav >> >> >>>>> >> >> >>>>> -- >> >> >>>>> Harsh J >> >> >>>>> Customer Ops. Engineer >> >> >>>>> Cloudera | http://tiny.cloudera.com/about >> >> > >> >> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak >> <[email protected]> >> >> wrote: >> >> > Was a JIRA ticket ever created regarding >> appending to >> >> an existing Avro file on HDFS? >> >> > >> >> > What is the status of such a capability, a >> year out >> >> from when the issue below was raised? >> >> > >> >> > On Wed, 22 Feb 2012 10:57:48 +0100, >> "Vyacheslav >> >> Zholudev" <[email protected]> >> >> wrote: >> >> > >> >> >> Thanks for your reply, I suspected this. >> >> >> >> >> >> I will create a JIRA ticket. >> >> >> >> >> >> Vyacheslav >> >> >> >> >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey >> wrote: >> >> >> >> >> >>> >> >> >>> On 2/21/12 7:29 AM, "Vyacheslav >> Zholudev" >> >> <[email protected]> >> >> >>> wrote: >> >> >>> >> >> >>>> Yep, I saw that method as well as >> the >> >> stackoverflow post. However, I'm >> >> >>>> interested how to append to a file >> on the >> >> arbitrary file system, not >> >> >>>> only on the local one. >> >> >>>> >> >> >>>> I want to get an OutputStream >> based on the >> >> Path and the FileSystem >> >> >>>> implementation and then pass it >> for >> >> appending to avro methods. >> >> >>>> >> >> >>>> Is that possible? >> >> >>> >> >> >>> It is not possible without modifying >> >> DataFileWriter. Please open a JIRA >> >> >>> ticket. >> >> >>> >> >> >>> It could not simply append to an >> OutputStream, >> >> since it must either: >> >> >>> * Seek to the start to validate the >> schemas >> >> match and find the sync >> >> >>> marker, or >> >> >>> * Trust that the schemas match and >> find the >> >> sync marker from the last >> >> >>> block >> >> >>> >> >> >>> DataFileWriter cannot refer to Hadoop >> classes >> >> such as FileSystem, but we >> >> >>> could add something to the mapred >> module that >> >> takes a Path and >> >> >>> FileSystem and returns something that >> >> implemements an interface that >> >> >>> DataFileWriter can append to. >> This would >> >> be something that is both a >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html >> >> >>> and an OutputStream, or has both an >> InputStream >> >> from the start of the >> >> >>> existing file and an OutputStream at >> the end. >> >> >>> >> >> >>>> Thanks, >> >> >>>> Vyacheslav >> >> >>>> >> >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh >> J >> >> wrote: >> >> >>>> >> >> >>>>> Hi, >> >> >>>>> >> >> >>>>> Use the appendTo feature of >> the >> >> DataFileWriter. See >> >> >>>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >> >> >>>>> >> >> >>>>> For a quick setup example, >> read also: >> >> >>>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file >> >> >>>>> >> >> >>>>> On Tue, Feb 21, 2012 at 3:15 >> AM, >> >> Vyacheslav Zholudev >> >> >>>>> <[email protected]> >> >> wrote: >> >> >>>>>> Hi, >> >> >>>>>> >> >> >>>>>> is it possible to append >> to an >> >> already existing avro file when it was >> >> >>>>>> written and closed >> before? >> >> >>>>>> >> >> >>>>>> If I use >> >> >>>>>> outputStream = >> >> fs.append(avroFilePath); >> >> >>>>>> >> >> >>>>>> then later on I get: >> >> java.io.IOException: Invalid sync! >> >> >>>>>> >> >> >>>>>> Probably because the >> schema is >> >> written twice and some other issues. >> >> >>>>>> >> >> >>>>>> If I use outputStream = >> >> fs.create(avroFilePath); then the avro file >> >> >>>>>> gets >> >> >>>>>> overwritten. >> >> >>>>>> >> >> >>>>>> Thanks, >> >> >>>>>> Vyacheslav >> >> >>>>> >> >> >>>>> -- >> >> >>>>> Harsh J >> >> >>>>> Customer Ops. Engineer >> >> >>>>> Cloudera | http://tiny.cloudera.com/about >> >> > >> >> >> -- Harsh J On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak <[email protected]> wrote: > I don't believe a Hadoop FileSystem is a Java OutputStream? > > --- On Tue, 2/5/13, Doug Cutting <[email protected]> wrote: > >> From: Doug Cutting <[email protected]> >> Subject: Re: Is it possible to append to an already existing avro file >> To: [email protected] >> Date: Tuesday, February 5, 2013, 5:27 PM >> It will work on an OutputStream that >> supports append. >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput, >> java.io.OutputStream) >> >> So it depends on how well HDFS implements >> FileSystem#append(), not on >> any changes in Avro. >> >> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path) >> >> I have no recent personal experience with append in >> HDFS. Does anyone >> else here? >> >> Doug >> >> On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak <[email protected]> >> wrote: >> > My understanding is that will append to a file on the >> local filesystem, but not to a file on HDFS. >> > >> > --- On Tue, 2/5/13, Doug Cutting <[email protected]> >> wrote: >> > >> >> From: Doug Cutting <[email protected]> >> >> Subject: Re: Is it possible to append to an already >> existing avro file >> >> To: [email protected] >> >> Date: Tuesday, February 5, 2013, 5:08 PM >> >> The Jira is: >> >> >> >> https://issues.apache.org/jira/browse/AVRO-1035 >> >> >> >> It is possible to append to an existing Avro file: >> >> >> >> http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >> >> >> >> Should we close that issue as "fixed"? >> >> >> >> Doug >> >> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak >> <[email protected]> >> >> wrote: >> >> > Was a JIRA ticket ever created regarding >> appending to >> >> an existing Avro file on HDFS? >> >> > >> >> > What is the status of such a capability, a >> year out >> >> from when the issue below was raised? >> >> > >> >> > On Wed, 22 Feb 2012 10:57:48 +0100, >> "Vyacheslav >> >> Zholudev" <[email protected]> >> >> wrote: >> >> > >> >> >> Thanks for your reply, I suspected this. >> >> >> >> >> >> I will create a JIRA ticket. >> >> >> >> >> >> Vyacheslav >> >> >> >> >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey >> wrote: >> >> >> >> >> >>> >> >> >>> On 2/21/12 7:29 AM, "Vyacheslav >> Zholudev" >> >> <[email protected]> >> >> >>> wrote: >> >> >>> >> >> >>>> Yep, I saw that method as well as >> the >> >> stackoverflow post. However, I'm >> >> >>>> interested how to append to a file >> on the >> >> arbitrary file system, not >> >> >>>> only on the local one. >> >> >>>> >> >> >>>> I want to get an OutputStream >> based on the >> >> Path and the FileSystem >> >> >>>> implementation and then pass it >> for >> >> appending to avro methods. >> >> >>>> >> >> >>>> Is that possible? >> >> >>> >> >> >>> It is not possible without modifying >> >> DataFileWriter. Please open a JIRA >> >> >>> ticket. >> >> >>> >> >> >>> It could not simply append to an >> OutputStream, >> >> since it must either: >> >> >>> * Seek to the start to validate the >> schemas >> >> match and find the sync >> >> >>> marker, or >> >> >>> * Trust that the schemas match and >> find the >> >> sync marker from the last >> >> >>> block >> >> >>> >> >> >>> DataFileWriter cannot refer to Hadoop >> classes >> >> such as FileSystem, but we >> >> >>> could add something to the mapred >> module that >> >> takes a Path and >> >> >>> FileSystem and returns something that >> >> implemements an interface that >> >> >>> DataFileWriter can append to. >> This would >> >> be something that is both a >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html >> >> >>> and an OutputStream, or has both an >> InputStream >> >> from the start of the >> >> >>> existing file and an OutputStream at >> the end. >> >> >>> >> >> >>>> Thanks, >> >> >>>> Vyacheslav >> >> >>>> >> >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh >> J >> >> wrote: >> >> >>>> >> >> >>>>> Hi, >> >> >>>>> >> >> >>>>> Use the appendTo feature of >> the >> >> DataFileWriter. See >> >> >>>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >> >> >>>>> >> >> >>>>> For a quick setup example, >> read also: >> >> >>>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file >> >> >>>>> >> >> >>>>> On Tue, Feb 21, 2012 at 3:15 >> AM, >> >> Vyacheslav Zholudev >> >> >>>>> <[email protected]> >> >> wrote: >> >> >>>>>> Hi, >> >> >>>>>> >> >> >>>>>> is it possible to append >> to an >> >> already existing avro file when it was >> >> >>>>>> written and closed >> before? >> >> >>>>>> >> >> >>>>>> If I use >> >> >>>>>> outputStream = >> >> fs.append(avroFilePath); >> >> >>>>>> >> >> >>>>>> then later on I get: >> >> java.io.IOException: Invalid sync! >> >> >>>>>> >> >> >>>>>> Probably because the >> schema is >> >> written twice and some other issues. >> >> >>>>>> >> >> >>>>>> If I use outputStream = >> >> fs.create(avroFilePath); then the avro file >> >> >>>>>> gets >> >> >>>>>> overwritten. >> >> >>>>>> >> >> >>>>>> Thanks, >> >> >>>>>> Vyacheslav >> >> >>>>> >> >> >>>>> -- >> >> >>>>> Harsh J >> >> >>>>> Customer Ops. Engineer >> >> >>>>> Cloudera | http://tiny.cloudera.com/about >> >> > >> >> >> >> On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak >> <[email protected]> >> >> wrote: >> >> > Was a JIRA ticket ever created regarding >> appending to >> >> an existing Avro file on HDFS? >> >> > >> >> > What is the status of such a capability, a >> year out >> >> from when the issue below was raised? >> >> > >> >> > On Wed, 22 Feb 2012 10:57:48 +0100, >> "Vyacheslav >> >> Zholudev" <[email protected]> >> >> wrote: >> >> > >> >> >> Thanks for your reply, I suspected this. >> >> >> >> >> >> I will create a JIRA ticket. >> >> >> >> >> >> Vyacheslav >> >> >> >> >> >> On Feb 21, 2012, at 6:02 PM, Scott Carey >> wrote: >> >> >> >> >> >>> >> >> >>> On 2/21/12 7:29 AM, "Vyacheslav >> Zholudev" >> >> <[email protected]> >> >> >>> wrote: >> >> >>> >> >> >>>> Yep, I saw that method as well as >> the >> >> stackoverflow post. However, I'm >> >> >>>> interested how to append to a file >> on the >> >> arbitrary file system, not >> >> >>>> only on the local one. >> >> >>>> >> >> >>>> I want to get an OutputStream >> based on the >> >> Path and the FileSystem >> >> >>>> implementation and then pass it >> for >> >> appending to avro methods. >> >> >>>> >> >> >>>> Is that possible? >> >> >>> >> >> >>> It is not possible without modifying >> >> DataFileWriter. Please open a JIRA >> >> >>> ticket. >> >> >>> >> >> >>> It could not simply append to an >> OutputStream, >> >> since it must either: >> >> >>> * Seek to the start to validate the >> schemas >> >> match and find the sync >> >> >>> marker, or >> >> >>> * Trust that the schemas match and >> find the >> >> sync marker from the last >> >> >>> block >> >> >>> >> >> >>> DataFileWriter cannot refer to Hadoop >> classes >> >> such as FileSystem, but we >> >> >>> could add something to the mapred >> module that >> >> takes a Path and >> >> >>> FileSystem and returns something that >> >> implemements an interface that >> >> >>> DataFileWriter can append to. >> This would >> >> be something that is both a >> >> >>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html >> >> >>> and an OutputStream, or has both an >> InputStream >> >> from the start of the >> >> >>> existing file and an OutputStream at >> the end. >> >> >>> >> >> >>>> Thanks, >> >> >>>> Vyacheslav >> >> >>>> >> >> >>>> On Feb 21, 2012, at 5:29 AM, Harsh >> J >> >> wrote: >> >> >>>> >> >> >>>>> Hi, >> >> >>>>> >> >> >>>>> Use the appendTo feature of >> the >> >> DataFileWriter. See >> >> >>>>> >> >> >>>>> http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) >> >> >>>>> >> >> >>>>> For a quick setup example, >> read also: >> >> >>>>> >> >> >>>>> http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file >> >> >>>>> >> >> >>>>> On Tue, Feb 21, 2012 at 3:15 >> AM, >> >> Vyacheslav Zholudev >> >> >>>>> <[email protected]> >> >> wrote: >> >> >>>>>> Hi, >> >> >>>>>> >> >> >>>>>> is it possible to append >> to an >> >> already existing avro file when it was >> >> >>>>>> written and closed >> before? >> >> >>>>>> >> >> >>>>>> If I use >> >> >>>>>> outputStream = >> >> fs.append(avroFilePath); >> >> >>>>>> >> >> >>>>>> then later on I get: >> >> java.io.IOException: Invalid sync! >> >> >>>>>> >> >> >>>>>> Probably because the >> schema is >> >> written twice and some other issues. >> >> >>>>>> >> >> >>>>>> If I use outputStream = >> >> fs.create(avroFilePath); then the avro file >> >> >>>>>> gets >> >> >>>>>> overwritten. >> >> >>>>>> >> >> >>>>>> Thanks, >> >> >>>>>> Vyacheslav >> >> >>>>> >> >> >>>>> -- >> >> >>>>> Harsh J >> >> >>>>> Customer Ops. Engineer >> >> >>>>> Cloudera | http://tiny.cloudera.com/about >> >> > >> >> >> -- Harsh J
