Re: Is it possible to append to an already existing avro file
Since the exception is thrown from java.io.FileInputStream#open, it's trying to append to a local file, not one in HDFS. You're passing 'new File(...)' to appendTo, when you should probably be passing 'new FsInput(...)'. Doug On Mon, Jul 8, 2013 at 9:29 AM, TrevniUser dipti.de...@cerner.com wrote: I was following this thread for a problem I am facing while using SortedKeyValueFiles. Below is the piece of code that tries to obtain the appropriate writer based on whether I am appending or creating a new file: OutputStream dataOutputStream; if (!fileSystem.exists(dataFilePath)) { dataOutputStream = fileSystem.create(dataFilePath); mDataFileWriter = new DataFileWriterGenericRecord(datumWriter).setSyncInterval(1 20).create(mRecordSchema, dataOutputStream); } else { dataOutputStream = fileSystem.append(dataFilePath); mDataFileWriter = new DataFileWriterGenericRecord(datumWriter).setSyncInterval(1 20).appendTo(new File(options.getPath() + DATA_FILENAME)); } but it fails with this: java.io.FileNotFoundException: /CHANGELOG/data (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.init(FileInputStream.java:120) at org.apache.avro.file.SeekableFileInput.init(SeekableFileInput.java:29) at org.apache.avro.file.DataFileWriter.appendTo(DataFileWriter.java:149) at com.abc.kepler.datasink.hdfs.util.SortedKeyValueFile$Writer.init(SortedKeyValueFile.java:597) at com.abc.kepler.datasink.hdfs.util.ChangeLogUtil.getChangeLogWriter(ChangeLogUtil.java:84) at com.abc.kepler.datasink.hdfs.HDFSDataSinkChangeLog.append(HDFSDataSinkChangeLog.java:219) at com.abc.kepler.datasink.hdfs.HDFSDataSinkChangesTest.writeDataSingleEntityKeyDefaultLocation(HDFSDataSinkChangesTest.java:1036) at com.abc.kepler.datasink.hdfs.HDFSDataSinkChangesTest.javadocExampleTest(HDFSDataSinkChangesTest.java:645) So, is the avro writer it not able to locate the file on hdfs? Could you please share some pointers what could be leading to this? -- View this message in context: http://apache-avro.679487.n3.nabble.com/Is-it-possible-to-append-to-an-already-existing-avro-file-tp3762049p4027785.html Sent from the Avro - Users mailing list archive at Nabble.com.
Re: Is it possible to append to an already existing avro file
Thanks for replying. You are correct. I followed this example https://gist.github.com/QwertyManiac/4724582 -- View this message in context: http://apache-avro.679487.n3.nabble.com/Is-it-possible-to-append-to-an-already-existing-avro-file-tp3762049p4027789.html Sent from the Avro - Users mailing list archive at Nabble.com.
Re: Is it possible to append to an already existing avro file
I assume by non-trivial you meant the extra Seekable stuff I needed to wrap around the DFS output streams to let Avro take it as append-able? I don't think its possible for Avro to carry it since Avro (core) does not reverse-depend on Hadoop. Should we document it somewhere though? Do you have any ideas on the best place to do that? On Thu, Feb 7, 2013 at 6:12 AM, Michael Malak michaelma...@yahoo.com wrote: Thanks so much for the code -- it works great! Since it is a non-trivial amount of code required to achieve append, I suggest attaching that code to AVRO-1035, in the hopes that someone will come up with an interface that requires just one line of user code to achieve append. --- On Wed, 2/6/13, Harsh J ha...@cloudera.com wrote: From: Harsh J ha...@cloudera.com Subject: Re: Is it possible to append to an already existing avro file To: user@avro.apache.org Date: Wednesday, February 6, 2013, 11:17 AM Hey Michael, It does implement the regular Java OutputStream interface, as seen in the API: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataOutputStream.html. Here's a sample program that works on Hadoop 2.x in my tests: https://gist.github.com/QwertyManiac/4724582 On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak michaelma...@yahoo.com wrote: I don't believe a Hadoop FileSystem is a Java OutputStream? --- On Tue, 2/5/13, Doug Cutting cutt...@apache.org wrote: From: Doug Cutting cutt...@apache.org Subject: Re: Is it possible to append to an already existing avro file To: user@avro.apache.org Date: Tuesday, February 5, 2013, 5:27 PM It will work on an OutputStream that supports append. http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput, java.io.OutputStream) So it depends on how well HDFS implements FileSystem#append(), not on any changes in Avro. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path) I have no recent personal experience with append in HDFS. Does anyone else here? Doug On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak michaelma...@yahoo.com wrote: My understanding is that will append to a file on the local filesystem, but not to a file on HDFS. --- On Tue, 2/5/13, Doug Cutting cutt...@apache.org wrote: From: Doug Cutting cutt...@apache.org Subject: Re: Is it possible to append to an already existing avro file To: user@avro.apache.org Date: Tuesday, February 5, 2013, 5:08 PM The Jira is: https://issues.apache.org/jira/browse/AVRO-1035 It is possible to append to an existing Avro file: http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) Should we close that issue as fixed? Doug On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak michaelma...@yahoo.com wrote: Was a JIRA ticket ever created regarding appending to an existing Avro file on HDFS? What is the status of such a capability, a year out from when the issue below was raised? On Wed, 22 Feb 2012 10:57:48 +0100, Vyacheslav Zholudev vyacheslav.zholu...@gmail.com wrote: Thanks for your reply, I suspected this. I will create a JIRA ticket. Vyacheslav On Feb 21, 2012, at 6:02 PM, Scott Carey wrote: On 2/21/12 7:29 AM, Vyacheslav Zholudev vyacheslav.zholu...@gmail.com wrote: Yep, I saw that method as well as the stackoverflow post. However, I'm interested how to append to a file on the arbitrary file system, not only on the local one. I want to get an OutputStream based on the Path and the FileSystem implementation and then pass it for appending to avro methods. Is that possible? It is not possible without modifying DataFileWriter. Please open a JIRA ticket. It could not simply append to an OutputStream, since it must either: * Seek to the start to validate the schemas match and find the sync marker, or * Trust that the schemas match and find the sync marker from the last block DataFileWriter cannot refer to Hadoop classes such as FileSystem, but we could add something to the mapred module that takes a Path and FileSystem and returns something that implemements an interface that DataFileWriter can append to. This would be something that is both a http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html and an OutputStream, or has both an InputStream from the start of the existing file and an OutputStream at the end. Thanks, Vyacheslav On Feb 21, 2012, at 5:29 AM, Harsh J wrote: Hi, Use the appendTo feature of the DataFileWriter. See
Re: Is it possible to append to an already existing avro file
I confess to being a user of rather than a developer of open source, but perhaps you could elaborate on what depends on means and what the consequences are? Isn't it -- or couldn't it be made -- a run-time binding, so that only those who try to use the HDFS append functionality would be required to also include the HDFS Jars in their classpath? Or is the issue more of a bookkeeping one, whereby every update to HDFS will require an Avro regression test? Now that Hive supports Avro as of the Jan. 11 release of Hive 0.10, the use case of ingesting data into Avro on HDFS is only going to get more popular, and appending is very handy for ingesting, especially for live real-time or near-real-time data. So it seems to me that if the inconveniences are minor or can be worked around, that Avro indeed should perhaps depend on HDFS. --- On Thu, 2/7/13, Harsh J ha...@cloudera.com wrote: From: Harsh J ha...@cloudera.com Subject: Re: Is it possible to append to an already existing avro file To: user@avro.apache.org Date: Thursday, February 7, 2013, 9:28 AM I assume by non-trivial you meant the extra Seekable stuff I needed to wrap around the DFS output streams to let Avro take it as append-able? I don't think its possible for Avro to carry it since Avro (core) does not reverse-depend on Hadoop. Should we document it somewhere though? Do you have any ideas on the best place to do that? On Thu, Feb 7, 2013 at 6:12 AM, Michael Malak michaelma...@yahoo.com wrote: Thanks so much for the code -- it works great! Since it is a non-trivial amount of code required to achieve append, I suggest attaching that code to AVRO-1035, in the hopes that someone will come up with an interface that requires just one line of user code to achieve append. --- On Wed, 2/6/13, Harsh J ha...@cloudera.com wrote: From: Harsh J ha...@cloudera.com Subject: Re: Is it possible to append to an already existing avro file To: user@avro.apache.org Date: Wednesday, February 6, 2013, 11:17 AM Hey Michael, It does implement the regular Java OutputStream interface, as seen in the API: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataOutputStream.html. Here's a sample program that works on Hadoop 2.x in my tests: https://gist.github.com/QwertyManiac/4724582
Re: Is it possible to append to an already existing avro file
The avro-mapred module includes a Seekable implementation that works with HDFS called FsInput: http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/FsInput.html With this, your example can be made considerably smaller. Doug On Thu, Feb 7, 2013 at 8:28 AM, Harsh J ha...@cloudera.com wrote: I assume by non-trivial you meant the extra Seekable stuff I needed to wrap around the DFS output streams to let Avro take it as append-able? I don't think its possible for Avro to carry it since Avro (core) does not reverse-depend on Hadoop. Should we document it somewhere though? Do you have any ideas on the best place to do that? On Thu, Feb 7, 2013 at 6:12 AM, Michael Malak michaelma...@yahoo.com wrote: Thanks so much for the code -- it works great! Since it is a non-trivial amount of code required to achieve append, I suggest attaching that code to AVRO-1035, in the hopes that someone will come up with an interface that requires just one line of user code to achieve append. --- On Wed, 2/6/13, Harsh J ha...@cloudera.com wrote: From: Harsh J ha...@cloudera.com Subject: Re: Is it possible to append to an already existing avro file To: user@avro.apache.org Date: Wednesday, February 6, 2013, 11:17 AM Hey Michael, It does implement the regular Java OutputStream interface, as seen in the API: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataOutputStream.html. Here's a sample program that works on Hadoop 2.x in my tests: https://gist.github.com/QwertyManiac/4724582 On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak michaelma...@yahoo.com wrote: I don't believe a Hadoop FileSystem is a Java OutputStream? --- On Tue, 2/5/13, Doug Cutting cutt...@apache.org wrote: From: Doug Cutting cutt...@apache.org Subject: Re: Is it possible to append to an already existing avro file To: user@avro.apache.org Date: Tuesday, February 5, 2013, 5:27 PM It will work on an OutputStream that supports append. http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput, java.io.OutputStream) So it depends on how well HDFS implements FileSystem#append(), not on any changes in Avro. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path) I have no recent personal experience with append in HDFS. Does anyone else here? Doug On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak michaelma...@yahoo.com wrote: My understanding is that will append to a file on the local filesystem, but not to a file on HDFS. --- On Tue, 2/5/13, Doug Cutting cutt...@apache.org wrote: From: Doug Cutting cutt...@apache.org Subject: Re: Is it possible to append to an already existing avro file To: user@avro.apache.org Date: Tuesday, February 5, 2013, 5:08 PM The Jira is: https://issues.apache.org/jira/browse/AVRO-1035 It is possible to append to an existing Avro file: http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) Should we close that issue as fixed? Doug On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak michaelma...@yahoo.com wrote: Was a JIRA ticket ever created regarding appending to an existing Avro file on HDFS? What is the status of such a capability, a year out from when the issue below was raised? On Wed, 22 Feb 2012 10:57:48 +0100, Vyacheslav Zholudev vyacheslav.zholu...@gmail.com wrote: Thanks for your reply, I suspected this. I will create a JIRA ticket. Vyacheslav On Feb 21, 2012, at 6:02 PM, Scott Carey wrote: On 2/21/12 7:29 AM, Vyacheslav Zholudev vyacheslav.zholu...@gmail.com wrote: Yep, I saw that method as well as the stackoverflow post. However, I'm interested how to append to a file on the arbitrary file system, not only on the local one. I want to get an OutputStream based on the Path and the FileSystem implementation and then pass it for appending to avro methods. Is that possible? It is not possible without modifying DataFileWriter. Please open a JIRA ticket. It could not simply append to an OutputStream, since it must either: * Seek to the start to validate the schemas match and find the sync marker, or * Trust that the schemas match and find the sync marker from the last block DataFileWriter cannot refer to Hadoop classes such as FileSystem, but we could add something to the mapred module that takes a Path and FileSystem and returns something that implemements an interface that DataFileWriter can append to. This would be something that is both a http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file
Re: Is it possible to append to an already existing avro file
Thanks so much for the code -- it works great! Since it is a non-trivial amount of code required to achieve append, I suggest attaching that code to AVRO-1035, in the hopes that someone will come up with an interface that requires just one line of user code to achieve append. --- On Wed, 2/6/13, Harsh J ha...@cloudera.com wrote: From: Harsh J ha...@cloudera.com Subject: Re: Is it possible to append to an already existing avro file To: user@avro.apache.org Date: Wednesday, February 6, 2013, 11:17 AM Hey Michael, It does implement the regular Java OutputStream interface, as seen in the API: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FSDataOutputStream.html. Here's a sample program that works on Hadoop 2.x in my tests: https://gist.github.com/QwertyManiac/4724582 On Wed, Feb 6, 2013 at 9:00 AM, Michael Malak michaelma...@yahoo.com wrote: I don't believe a Hadoop FileSystem is a Java OutputStream? --- On Tue, 2/5/13, Doug Cutting cutt...@apache.org wrote: From: Doug Cutting cutt...@apache.org Subject: Re: Is it possible to append to an already existing avro file To: user@avro.apache.org Date: Tuesday, February 5, 2013, 5:27 PM It will work on an OutputStream that supports append. http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput, java.io.OutputStream) So it depends on how well HDFS implements FileSystem#append(), not on any changes in Avro. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path) I have no recent personal experience with append in HDFS. Does anyone else here? Doug On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak michaelma...@yahoo.com wrote: My understanding is that will append to a file on the local filesystem, but not to a file on HDFS. --- On Tue, 2/5/13, Doug Cutting cutt...@apache.org wrote: From: Doug Cutting cutt...@apache.org Subject: Re: Is it possible to append to an already existing avro file To: user@avro.apache.org Date: Tuesday, February 5, 2013, 5:08 PM The Jira is: https://issues.apache.org/jira/browse/AVRO-1035 It is possible to append to an existing Avro file: http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) Should we close that issue as fixed? Doug On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak michaelma...@yahoo.com wrote: Was a JIRA ticket ever created regarding appending to an existing Avro file on HDFS? What is the status of such a capability, a year out from when the issue below was raised? On Wed, 22 Feb 2012 10:57:48 +0100, Vyacheslav Zholudev vyacheslav.zholu...@gmail.com wrote: Thanks for your reply, I suspected this. I will create a JIRA ticket. Vyacheslav On Feb 21, 2012, at 6:02 PM, Scott Carey wrote: On 2/21/12 7:29 AM, Vyacheslav Zholudev vyacheslav.zholu...@gmail.com wrote: Yep, I saw that method as well as the stackoverflow post. However, I'm interested how to append to a file on the arbitrary file system, not only on the local one. I want to get an OutputStream based on the Path and the FileSystem implementation and then pass it for appending to avro methods. Is that possible? It is not possible without modifying DataFileWriter. Please open a JIRA ticket. It could not simply append to an OutputStream, since it must either: * Seek to the start to validate the schemas match and find the sync marker, or * Trust that the schemas match and find the sync marker from the last block DataFileWriter cannot refer to Hadoop classes such as FileSystem, but we could add something to the mapred module that takes a Path and FileSystem and returns something that implemements an interface that DataFileWriter can append to. This would be something that is both a http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html and an OutputStream, or has both an InputStream from the start of the existing file and an OutputStream at the end. Thanks, Vyacheslav On Feb 21, 2012, at 5:29 AM, Harsh J wrote: Hi, Use the appendTo feature of the DataFileWriter. See http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) For a quick setup example, read also: http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file On Tue, Feb 21, 2012 at 3:15 AM, Vyacheslav Zholudev vyacheslav.zholu...@gmail.com wrote: Hi, is it possible to append
Re: Is it possible to append to an already existing avro file
It will work on an OutputStream that supports append. http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput, java.io.OutputStream) So it depends on how well HDFS implements FileSystem#append(), not on any changes in Avro. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path) I have no recent personal experience with append in HDFS. Does anyone else here? Doug On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak michaelma...@yahoo.com wrote: My understanding is that will append to a file on the local filesystem, but not to a file on HDFS. --- On Tue, 2/5/13, Doug Cutting cutt...@apache.org wrote: From: Doug Cutting cutt...@apache.org Subject: Re: Is it possible to append to an already existing avro file To: user@avro.apache.org Date: Tuesday, February 5, 2013, 5:08 PM The Jira is: https://issues.apache.org/jira/browse/AVRO-1035 It is possible to append to an existing Avro file: http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) Should we close that issue as fixed? Doug On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak michaelma...@yahoo.com wrote: Was a JIRA ticket ever created regarding appending to an existing Avro file on HDFS? What is the status of such a capability, a year out from when the issue below was raised? On Wed, 22 Feb 2012 10:57:48 +0100, Vyacheslav Zholudev vyacheslav.zholu...@gmail.com wrote: Thanks for your reply, I suspected this. I will create a JIRA ticket. Vyacheslav On Feb 21, 2012, at 6:02 PM, Scott Carey wrote: On 2/21/12 7:29 AM, Vyacheslav Zholudev vyacheslav.zholu...@gmail.com wrote: Yep, I saw that method as well as the stackoverflow post. However, I'm interested how to append to a file on the arbitrary file system, not only on the local one. I want to get an OutputStream based on the Path and the FileSystem implementation and then pass it for appending to avro methods. Is that possible? It is not possible without modifying DataFileWriter. Please open a JIRA ticket. It could not simply append to an OutputStream, since it must either: * Seek to the start to validate the schemas match and find the sync marker, or * Trust that the schemas match and find the sync marker from the last block DataFileWriter cannot refer to Hadoop classes such as FileSystem, but we could add something to the mapred module that takes a Path and FileSystem and returns something that implemements an interface that DataFileWriter can append to. This would be something that is both a http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html and an OutputStream, or has both an InputStream from the start of the existing file and an OutputStream at the end. Thanks, Vyacheslav On Feb 21, 2012, at 5:29 AM, Harsh J wrote: Hi, Use the appendTo feature of the DataFileWriter. See http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) For a quick setup example, read also: http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file On Tue, Feb 21, 2012 at 3:15 AM, Vyacheslav Zholudev vyacheslav.zholu...@gmail.com wrote: Hi, is it possible to append to an already existing avro file when it was written and closed before? If I use outputStream = fs.append(avroFilePath); then later on I get: java.io.IOException: Invalid sync! Probably because the schema is written twice and some other issues. If I use outputStream = fs.create(avroFilePath); then the avro file gets overwritten. Thanks, Vyacheslav -- Harsh J Customer Ops. Engineer Cloudera | http://tiny.cloudera.com/about On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak michaelma...@yahoo.com wrote: Was a JIRA ticket ever created regarding appending to an existing Avro file on HDFS? What is the status of such a capability, a year out from when the issue below was raised? On Wed, 22 Feb 2012 10:57:48 +0100, Vyacheslav Zholudev vyacheslav.zholu...@gmail.com wrote: Thanks for your reply, I suspected this. I will create a JIRA ticket. Vyacheslav On Feb 21, 2012, at 6:02 PM, Scott Carey wrote: On 2/21/12 7:29 AM, Vyacheslav Zholudev vyacheslav.zholu...@gmail.com wrote: Yep, I saw that method as well as the stackoverflow post. However, I'm interested how to append to a file on the arbitrary file system, not only on the local one. I want to get an OutputStream based on the Path and the FileSystem implementation and then pass it for appending to avro methods. Is that possible? It is not possible without modifying DataFileWriter. Please open a JIRA ticket. It could not simply append
Re: Is it possible to append to an already existing avro file
I don't believe a Hadoop FileSystem is a Java OutputStream? --- On Tue, 2/5/13, Doug Cutting cutt...@apache.org wrote: From: Doug Cutting cutt...@apache.org Subject: Re: Is it possible to append to an already existing avro file To: user@avro.apache.org Date: Tuesday, February 5, 2013, 5:27 PM It will work on an OutputStream that supports append. http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(org.apache.avro.file.SeekableInput, java.io.OutputStream) So it depends on how well HDFS implements FileSystem#append(), not on any changes in Avro. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#append(org.apache.hadoop.fs.Path) I have no recent personal experience with append in HDFS. Does anyone else here? Doug On Tue, Feb 5, 2013 at 4:10 PM, Michael Malak michaelma...@yahoo.com wrote: My understanding is that will append to a file on the local filesystem, but not to a file on HDFS. --- On Tue, 2/5/13, Doug Cutting cutt...@apache.org wrote: From: Doug Cutting cutt...@apache.org Subject: Re: Is it possible to append to an already existing avro file To: user@avro.apache.org Date: Tuesday, February 5, 2013, 5:08 PM The Jira is: https://issues.apache.org/jira/browse/AVRO-1035 It is possible to append to an existing Avro file: http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) Should we close that issue as fixed? Doug On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak michaelma...@yahoo.com wrote: Was a JIRA ticket ever created regarding appending to an existing Avro file on HDFS? What is the status of such a capability, a year out from when the issue below was raised? On Wed, 22 Feb 2012 10:57:48 +0100, Vyacheslav Zholudev vyacheslav.zholu...@gmail.com wrote: Thanks for your reply, I suspected this. I will create a JIRA ticket. Vyacheslav On Feb 21, 2012, at 6:02 PM, Scott Carey wrote: On 2/21/12 7:29 AM, Vyacheslav Zholudev vyacheslav.zholu...@gmail.com wrote: Yep, I saw that method as well as the stackoverflow post. However, I'm interested how to append to a file on the arbitrary file system, not only on the local one. I want to get an OutputStream based on the Path and the FileSystem implementation and then pass it for appending to avro methods. Is that possible? It is not possible without modifying DataFileWriter. Please open a JIRA ticket. It could not simply append to an OutputStream, since it must either: * Seek to the start to validate the schemas match and find the sync marker, or * Trust that the schemas match and find the sync marker from the last block DataFileWriter cannot refer to Hadoop classes such as FileSystem, but we could add something to the mapred module that takes a Path and FileSystem and returns something that implemements an interface that DataFileWriter can append to. This would be something that is both a http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/SeekableInput.html and an OutputStream, or has both an InputStream from the start of the existing file and an OutputStream at the end. Thanks, Vyacheslav On Feb 21, 2012, at 5:29 AM, Harsh J wrote: Hi, Use the appendTo feature of the DataFileWriter. See http://avro.apache.org/docs/1.6.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File) For a quick setup example, read also: http://stackoverflow.com/questions/8806689/can-you-append-data-to-an-existing-avro-data-file On Tue, Feb 21, 2012 at 3:15 AM, Vyacheslav Zholudev vyacheslav.zholu...@gmail.com wrote: Hi, is it possible to append to an already existing avro file when it was written and closed before? If I use outputStream = fs.append(avroFilePath); then later on I get: java.io.IOException: Invalid sync! Probably because the schema is written twice and some other issues. If I use outputStream = fs.create(avroFilePath); then the avro file gets overwritten. Thanks, Vyacheslav -- Harsh J Customer Ops. Engineer Cloudera | http://tiny.cloudera.com/about On Fri, Feb 1, 2013 at 11:32 AM, Michael Malak michaelma...@yahoo.com wrote: Was a JIRA ticket ever created regarding appending to an existing Avro file on HDFS? What is the status of such a capability, a year out from when the issue below was raised? On Wed, 22 Feb 2012 10:57:48 +0100, Vyacheslav Zholudev vyacheslav.zholu...@gmail.com wrote: Thanks for your reply, I suspected this. I will create a JIRA ticket. Vyacheslav On Feb 21, 2012, at 6:02 PM, Scott Carey wrote: On 2/21/12 7:29 AM, Vyacheslav
Is it possible to append to an already existing avro file
Hi, is it possible to append to an already existing avro file when it was written and closed before? If I use outputStream = fs.append(avroFilePath); then later on I get: java.io.IOException: Invalid sync! Probably because the schema is written twice and some other issues. If I use outputStream = fs.create(avroFilePath); then the avro file gets overwritten. Thanks, Vyacheslav