The below avro type for ClientIP is a nullable string. Therefore the type returned is most likely a String. The type that Avro (in recent versions) will return when reading will be a CharSequence that may be null, and it will accept null or any CharSequence (such as String or Utf8) for writing.
On 7/25/11 12:11 PM, "felix gao" <[email protected]> wrote: > ignore the previous one. accidentally hit send before complete the message. > >> I am trying to produce some avro file based on a TSV file. We had an >> original schema which is defined like >> { "type": "record", >> "name": "accessLog", >> "namespace": "avro_access_log", >> "fields": [ >> {"name": "SquidIP" , "type": "string" }, >> {"name": "Timestamp" , "type": "long" }, >> {"name": "Hostname", "type": "string" }, >> ] >> } >> >> now that we have added additional fields, I would like to change my new >> schema to >> >> { "type": "record", >> "name": "accessLog", >> "namespace": "avro_access_log", >> "fields": [ >> {"name": "SquidIP" , "type": "string" }, >> {"name": "Timestamp" , "type": "long" }, >> {"name": "Hostname", "type": "string" }, >> {"name": "ClientIP", "type": ["string", "null"] } >> ] >> } >> >> if i understand correctly the last field should be type Union, and below is >> my code that generate the record. What I would like to know is how to >> return the correct Union type when I call Object value = >> ConvertFieldToType(getColumnType(col), v, col); the convertFieldToType >> simply convert a string to a long if the type is long. what should be the >> corrected value to return for getColumnType for ClientIP field in my example? >> >> >> >> public static Object generateDatumBasedOnSchema(Schema schema, String >> line, Map<String, Integer> badConversions){ >> GenericRecord record = new GenericData.Record(schema); >> int fieldLength = schema.getFields().size(); >> int col =0; >> String[] fields = line.trim().split("\t"); >> while(col < fieldLength){ >> try{ >> String name = getColumnName(col); >> String v = "-"; >> try{ >> v = fields[col]; >> }catch(ArrayIndexOutOfBoundsException e){ >> if (alertedAIOOBE < 5){ >> System.err.println("index "+col+" is not in fields"); >> } >> alertedAIOOBE++; >> return null; >> } >> Object value = ConvertFieldToType(getColumnType(col), v, >> col); >> record.put(name, value); >> col++; >> }catch(NullPointerException npe){ //this is threw when there is >> no matching name for the column which indicates our schema is older than the >> data. >> System.err.println("Schema: "+schema.toString()+" does not >> match line "+line); >> return null; >> } >> catch(RuntimeException re){ >> System.err.println("Unknown option at "+col); >> return null; >> } >> catch(Exception e){ >> e.printStackTrace(); >> return null; >> } >> >> } >> return record; >> } > > Thanks, > > Felix >
