Re: AVRO Append HDFS using saveAsNewAPIHadoopFile
Yes it provides but whatever have seen its line by line update. Please see below link https://gist.github.com/QwertyManiac/4724582 This is very slow because of append Avro , am thinking of something which we normally do for test files where we buffer the data to a size and the flush the buffer. On Mon, Jan 9, 2017 at 3:17 PM, Jörn Franke <jornfra...@gmail.com> wrote: > Avro itself supports it, but I am not sure if this functionality is > available through the Spark API. Just out of curiosity, if your use case is > only write to HDFS then you might use simply flume. > > On 9 Jan 2017, at 09:58, awkysam <contactsanto...@gmail.com> wrote: > > Currently for our project we are collecting data and pushing into Kafka > with messages are in Avro format. We need to push this data into HDFS and > we are using SparkStreaming and in HDFS also it is stored in Avro format. > We are partitioning the data per each day. So when we write data into HDFS > we need to append to the same file. Curenttly we are using > GenericRecordWriter and we will be using saveAsNewAPIHadoopFile for writing > into HDFS. Is there a way to append data into file in HDFS with Avro format > using saveAsNewAPIHadoopFile ? Thanks, Santosh B > ---------- > View this message in context: AVRO Append HDFS using > saveAsNewAPIHadoopFile > <http://apache-spark-user-list.1001560.n3.nabble.com/AVRO-Append-HDFS-using-saveAsNewAPIHadoopFile-tp28292.html> > Sent from the Apache Spark User List mailing list archive > <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com. > >
Re: AVRO Append HDFS using saveAsNewAPIHadoopFile
Avro itself supports it, but I am not sure if this functionality is available through the Spark API. Just out of curiosity, if your use case is only write to HDFS then you might use simply flume. > On 9 Jan 2017, at 09:58, awkysam <contactsanto...@gmail.com> wrote: > > Currently for our project we are collecting data and pushing into Kafka with > messages are in Avro format. We need to push this data into HDFS and we are > using SparkStreaming and in HDFS also it is stored in Avro format. We are > partitioning the data per each day. So when we write data into HDFS we need > to append to the same file. Curenttly we are using GenericRecordWriter and we > will be using saveAsNewAPIHadoopFile for writing into HDFS. Is there a way to > append data into file in HDFS with Avro format using saveAsNewAPIHadoopFile ? > Thanks, Santosh B > View this message in context: AVRO Append HDFS using saveAsNewAPIHadoopFile > Sent from the Apache Spark User List mailing list archive at Nabble.com.
AVRO Append HDFS using saveAsNewAPIHadoopFile
Currently for our project we are collecting data and pushing into Kafka with messages are in Avro format. We need to push this data into HDFS and we are using SparkStreaming and in HDFS also it is stored in Avro format. We are partitioning the data per each day. So when we write data into HDFS we need to append to the same file. Curenttly we are using GenericRecordWriter and we will be using saveAsNewAPIHadoopFile for writing into HDFS.Is there a way to append data into file in HDFS with Avro format using saveAsNewAPIHadoopFile ?Thanks,Santosh B -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/AVRO-Append-HDFS-using-saveAsNewAPIHadoopFile-tp28292.html Sent from the Apache Spark User List mailing list archive at Nabble.com.