[GitHub] carbondata pull request #2671: [CARBONDATA-2876]AVRO datatype support throug...

2018-08-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2671


---


[GitHub] carbondata pull request #2671: [CARBONDATA-2876]AVRO datatype support throug...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2671#discussion_r214244301
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java ---
@@ -213,6 +181,124 @@ private Object avroFieldToObject(Schema.Field 
avroField, Object fieldValue) {
 }
 out = new ArrayObject(arrayChildObjects);
 break;
+  case UNION:
+// Union type will be internally stored as Struct
+// Fill data object only if fieldvalue is instance of datatype
+// For other field datatypes, fill value as Null
+List unionFields = avroField.schema().getTypes();
+int notNullUnionFieldsCount = 0;
+for (Schema unionField : unionFields) {
+  if (!unionField.getType().equals(Schema.Type.NULL)) {
+notNullUnionFieldsCount++;
+  }
+}
+Object[] values = new Object[notNullUnionFieldsCount];
+int j = 0;
+for (Schema unionField : unionFields) {
+  if (!unionField.getType().equals(Schema.Type.NULL) && 
checkFieldValueType(
+  unionField.getType(), fieldValue)) {
+values[j] = avroFieldToObjectForUnionType(unionField, 
fieldValue, avroField);
+break;
+  }
+  if (notNullUnionFieldsCount != 1) {
+j++;
+  }
+}
--- End diff --

Modify the code as below
int j = 0;
for (Schema unionField : unionFields) {
  if (unionField.getType().equals(Schema.Type.NULL) ) {
 continue;
  }
   if (checkFieldValueType(unionField.getType(), fieldValue)) {
values[j] = avroFieldToObjectForUnionType(unionField, 
fieldValue, avroField);
break;
  }
  j++;
}


---


[GitHub] carbondata pull request #2671: [CARBONDATA-2876]AVRO datatype support throug...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2671#discussion_r214024187
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java ---
@@ -310,6 +503,31 @@ private static Field prepareFields(Schema.Field 
avroField) {
 } else {
   return null;
 }
+  case UNION:
+int i = 0;
+// Get union types and store as Struct
+ArrayList unionFields = new ArrayList<>();
+for (Schema avroSubField : avroField.schema().getTypes()) {
+  StructField unionField = prepareSubFields(avroField.name() + 
i++, avroSubField);
--- End diff --

check for NULL schema here


---


[GitHub] carbondata pull request #2671: [CARBONDATA-2876]AVRO datatype support throug...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2671#discussion_r214023293
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java ---
@@ -213,6 +181,126 @@ private Object avroFieldToObject(Schema.Field 
avroField, Object fieldValue) {
 }
 out = new ArrayObject(arrayChildObjects);
 break;
+  case UNION:
+// Union type will be internally stored as Struct
+// Fill data object only if fieldvalue is instance of datatype
+// For other field datatypes, fill value as Null
+List unionFields = avroField.schema().getTypes();
+int notNullUnionFieldsCount = 0;
+for (Schema unionField : unionFields) {
+  if (!unionField.getType().equals(Schema.Type.NULL)) {
+notNullUnionFieldsCount++;
+  }
+}
+Object[] values = new Object[notNullUnionFieldsCount];
+int j = 0;
+for (Schema unionField : unionFields) {
+  if (!unionField.getType().equals(Schema.Type.NULL)) {
+if (checkFieldValueType(unionField.getType(), fieldValue)) {
+  values[j] = avroFieldToObjectForUnionType(unionField, 
fieldValue, avroField);
+} else {
+  values[j] = null;
+}
+j++;
+  }
+}
+out = new StructObject(values);
+break;
+  default:
+out = avroPrimitiveFieldToObject(type, logicalType, fieldValue);
+}
+return out;
+  }
+
+  /**
+   * For Union type, fill data if Schema.Type is instance of fieldValue
+   * and return result
+   *
+   * @param type
+   * @param fieldValue
+   * @return
+   */
+  private boolean checkFieldValueType(Schema.Type type, Object fieldValue) 
{
+switch (type) {
+  case INT:
+return (fieldValue instanceof Integer);
+  case BOOLEAN:
+return (fieldValue instanceof Boolean);
+  case LONG:
+return (fieldValue instanceof Long);
+  case DOUBLE:
+return (fieldValue instanceof Double);
+  case STRING:
+return (fieldValue instanceof Utf8 || fieldValue instanceof 
String);
+  case FLOAT:
+return (fieldValue instanceof Float);
+  case RECORD:
+return (fieldValue instanceof GenericData.Record);
+  case ARRAY:
+return (fieldValue instanceof GenericData.Array || fieldValue 
instanceof ArrayList);
+  case BYTES:
+return (fieldValue instanceof ByteBuffer);
+  case MAP:
+return (fieldValue instanceof HashMap);
+  case ENUM:
+return (fieldValue instanceof GenericData.EnumSymbol);
+  default:
+return false;
+}
+  }
+
+  private Object avroPrimitiveFieldToObject(Schema.Type type, LogicalType 
logicalType,
+  Object fieldValue) {
+Object out;
+switch (type) {
+  case INT:
+if (logicalType != null) {
+  if (logicalType instanceof LogicalTypes.Date) {
+int dateIntValue = (int) fieldValue;
+out = dateIntValue * 
DateDirectDictionaryGenerator.MILLIS_PER_DAY;
+  } else {
+LOGGER.warn("Actual type: INT, Logical Type: " + 
logicalType.getName());
+out = fieldValue;
+  }
+} else {
+  out = fieldValue;
+}
+break;
+  case BOOLEAN:
+  case LONG:
+if (logicalType != null && !(logicalType instanceof 
LogicalTypes.TimestampMillis)) {
+  if (logicalType instanceof LogicalTypes.TimestampMicros) {
+long dateIntValue = (long) fieldValue;
+out = dateIntValue / 1000L;
+  } else {
+LOGGER.warn("Actual type: INT, Logical Type: " + 
logicalType.getName());
+out = fieldValue;
+  }
+} else {
+  out = fieldValue;
+}
+break;
+  case DOUBLE:
+  case STRING:
+  case ENUM:
+out = fieldValue;
+break;
+  case FLOAT:
+// direct conversion will change precision. So parse from string.
+// also carbon internally needs float as double
+out = Double.parseDouble(fieldValue.toString());
+break;
+  case BYTES:
+// DECIMAL type is defined in Avro as a BYTE type with the 
logicalType property
+// set to "decimal" and a specified precision and scale
+if (logicalType instanceof LogicalTypes.Decimal) {
+  out = new BigDecimal(new String(((ByteBuffer) 
fieldValue).array(),
+

[GitHub] carbondata pull request #2671: [CARBONDATA-2876]AVRO datatype support throug...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2671#discussion_r214024597
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java ---
@@ -310,6 +503,31 @@ private static Field prepareFields(Schema.Field 
avroField) {
 } else {
   return null;
 }
+  case UNION:
+int i = 0;
+// Get union types and store as Struct
+ArrayList unionFields = new ArrayList<>();
+for (Schema avroSubField : avroField.schema().getTypes()) {
+  StructField unionField = prepareSubFields(avroField.name() + 
i++, avroSubField);
+  if (unionField != null) {
+unionFields.add(unionField);
+  }
+}
+if (unionFields.size() != 0) {
--- End diff --

replace 'unionFields.size()' with 'unionFields.isEmpty()'


---


[GitHub] carbondata pull request #2671: [CARBONDATA-2876]AVRO datatype support throug...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2671#discussion_r214022747
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java ---
@@ -213,6 +181,126 @@ private Object avroFieldToObject(Schema.Field 
avroField, Object fieldValue) {
 }
 out = new ArrayObject(arrayChildObjects);
 break;
+  case UNION:
+// Union type will be internally stored as Struct
+// Fill data object only if fieldvalue is instance of datatype
+// For other field datatypes, fill value as Null
+List unionFields = avroField.schema().getTypes();
+int notNullUnionFieldsCount = 0;
+for (Schema unionField : unionFields) {
+  if (!unionField.getType().equals(Schema.Type.NULL)) {
+notNullUnionFieldsCount++;
+  }
+}
+Object[] values = new Object[notNullUnionFieldsCount];
+int j = 0;
+for (Schema unionField : unionFields) {
+  if (!unionField.getType().equals(Schema.Type.NULL)) {
+if (checkFieldValueType(unionField.getType(), fieldValue)) {
+  values[j] = avroFieldToObjectForUnionType(unionField, 
fieldValue, avroField);
+} else {
+  values[j] = null;
+}
--- End diff --

1. Remove else block
2. Combine above 2 if conditions into 1 using && operator
3. break the loop once if check is success


---


[GitHub] carbondata pull request #2671: [CARBONDATA-2876]AVRO datatype support throug...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2671#discussion_r213914935
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java ---
@@ -213,6 +218,225 @@ private Object avroFieldToObject(Schema.Field 
avroField, Object fieldValue) {
 }
 out = new ArrayObject(arrayChildObjects);
 break;
+  case BYTES:
+// DECIMAL type is defined in Avro as a BYTE type with the 
logicalType property
+// set to "decimal" and a specified precision and scale
+if (logicalType instanceof LogicalTypes.Decimal) {
+  out = new BigDecimal(new String(((ByteBuffer) 
fieldValue).array(),
+  CarbonCommonConstants.DEFAULT_CHARSET_CLASS));
+  ;
--- End diff --

Remove this semi-colon


---


[GitHub] carbondata pull request #2671: [CARBONDATA-2876]AVRO datatype support throug...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2671#discussion_r213916136
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java ---
@@ -213,6 +218,225 @@ private Object avroFieldToObject(Schema.Field 
avroField, Object fieldValue) {
 }
 out = new ArrayObject(arrayChildObjects);
 break;
+  case BYTES:
+// DECIMAL type is defined in Avro as a BYTE type with the 
logicalType property
+// set to "decimal" and a specified precision and scale
+if (logicalType instanceof LogicalTypes.Decimal) {
+  out = new BigDecimal(new String(((ByteBuffer) 
fieldValue).array(),
+  CarbonCommonConstants.DEFAULT_CHARSET_CLASS));
+  ;
+} else {
+  out = fieldValue;
+}
+break;
+  case UNION:
+List fieldsUnion = avroField.schema().getTypes();
+int countIfNotNull = 0;
+for (Schema unionField : fieldsUnion) {
+  if (!unionField.getType().equals(Schema.Type.NULL)) {
+countIfNotNull++;
+  }
+}
+Object[] values;
+values = new Object[countIfNotNull];
--- End diff --

1. Rename countIfNotNull to notNullUnionFieldsCount
2. merge above 2 lines 'Object[] values = new Object[countIfNotNull];'
3. Check union behavior for only NULL type and handle if any special 
handling is required


---


[GitHub] carbondata pull request #2671: [CARBONDATA-2876]AVRO datatype support throug...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2671#discussion_r213915304
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java ---
@@ -213,6 +218,225 @@ private Object avroFieldToObject(Schema.Field 
avroField, Object fieldValue) {
 }
 out = new ArrayObject(arrayChildObjects);
 break;
+  case BYTES:
+// DECIMAL type is defined in Avro as a BYTE type with the 
logicalType property
+// set to "decimal" and a specified precision and scale
+if (logicalType instanceof LogicalTypes.Decimal) {
+  out = new BigDecimal(new String(((ByteBuffer) 
fieldValue).array(),
+  CarbonCommonConstants.DEFAULT_CHARSET_CLASS));
+  ;
+} else {
+  out = fieldValue;
--- End diff --

else case handling is not required as Binary data type is not 
supported...so write a proper comment to say that only decimal logical type is 
support for avro Byte data type. Once binary data type is supported we can add 
the else block


---


[GitHub] carbondata pull request #2671: [CARBONDATA-2876]AVRO datatype support throug...

2018-08-30 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2671#discussion_r213922678
  
--- Diff: 
store/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java ---
@@ -213,6 +218,225 @@ private Object avroFieldToObject(Schema.Field 
avroField, Object fieldValue) {
 }
 out = new ArrayObject(arrayChildObjects);
 break;
+  case BYTES:
+// DECIMAL type is defined in Avro as a BYTE type with the 
logicalType property
+// set to "decimal" and a specified precision and scale
+if (logicalType instanceof LogicalTypes.Decimal) {
+  out = new BigDecimal(new String(((ByteBuffer) 
fieldValue).array(),
+  CarbonCommonConstants.DEFAULT_CHARSET_CLASS));
+  ;
+} else {
+  out = fieldValue;
+}
+break;
+  case UNION:
+List fieldsUnion = avroField.schema().getTypes();
+int countIfNotNull = 0;
+for (Schema unionField : fieldsUnion) {
+  if (!unionField.getType().equals(Schema.Type.NULL)) {
+countIfNotNull++;
+  }
+}
+Object[] values;
+values = new Object[countIfNotNull];
+int j = 0;
+for (Schema unionField : fieldsUnion) {
+  if (!unionField.getType().equals(Schema.Type.NULL)) {
+values[j] = avroFieldToObjectForUnionType(unionField, 
fieldValue, avroField);
--- End diff --

1. Try to reuse the code as much as possible. You can extract the 
primitives types computation in a separate method and override only complex 
types as different methods for union and rest of the data types
2. Write a function for union type to identify the schema for which the 
computation need to be done. Do not call the computation for all the union 
types as at one time only one type of value will exists


---