Greetings,

I'm attempting to convert some very large CSV files into AVRO format. To this 
end, I wrote a csvtoavro converter using C API v1.7.5.

The essence of the program is this:

// initialize line counter
lineno = 0;

// make a schema first
avro_schema_from_json_length (...);

// make a generic class from schema
iface = avro_generic_class_from_schema( schema );

// get the record size and verify that it is 109
avro_schema_record_size (schema);

// get a generic value
avro_generic_value_new (iface, &tuple);

// make me an output file
fp = fopen ( outputfile, "wb" );

// make me a filewriter
avro_file_writer_create_fp (fp, outputfile, 0, schema, &db);

// now for the code to emit the data

while (...)
{
    avro_value_reset (&tuple);

    // get the CSV record into the tuple
    ...

    // write that tuple
    avro_file_writer_append_value (db, &tuple);

    lineno ++;

    // flush the file
    avro_file_writer_flush (db);
}

// close the output file
avro_file_writer_close (db);

// other cleanup
avro_value_iface_decref (iface);
avro_value_decref (&tuple);

// close output file
fflush (outfp);
fclose (outfp);
I read the file using a modified version of avrocat.c that looks like this.


wschema = avro_file_reader_get_writer_schema(reader);

iface = avro_generic_class_from_schema(wschema);

avro_generic_value_new(iface, &value);



int rval;

lineno = 0;



while ((rval = avro_file_reader_read_value(reader, &value)) == 0) {

lineno ++;

avro_value_reset(&value);

}



// If it was not an EOF that caused it to fail,

// print the error.

if (rval != EOF)

{

fprintf(stderr, "Error: %s\n", avro_strerror());

}

else

{

printf ( "%s %lld\n", filename, lineno );



}

On many files, I find no data is missing in the .AVRO file. However, quite 
often I get files where several dozen rows of data are missing.

I'm certain that I'm doing something wrong, and something very basic. Any help 
debugging would be most appreciated.

Thanks,

-amrith

Reply via email to