[GitHub] [incubator-hudi] cdmikechen commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2020-01-12 Thread GitBox
cdmikechen commented on issue #770: remove com.databricks:spark-avro to build 
spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-573493381
 
 
   Close thie PR right now. Some problems have been fixed in PR 
https://github.com/apache/incubator-hudi/pull/1005. The remaining timestamp 
type problem will be further discussed in other JIRA's issues.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] cdmikechen commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-18 Thread GitBox
cdmikechen commented on issue #770: remove com.databricks:spark-avro to build 
spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532921049
 
 
   I will continue to discuss this issue on JIRA later. 
   
   The version I'm running in the production environment now is the Hudi 0.4.8 
version with this PR added. If there are new changes, I can also do some 
experiments in my test environment.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] cdmikechen commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-18 Thread GitBox
cdmikechen commented on issue #770: remove com.databricks:spark-avro to build 
spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532919914
 
 
   @vinothchandar @umehrot2 
   I've tried changing avro version to 1.8.2 in hudi pom.xml before. Spark 2.2 
or 2.3 don't use avro 1.8.2 jars in hoodie jar, It will use avro 1.7.7 first 
and I will still encounter the same mistake(report missing logical type class 
and so on).
   Maybe you can try using below codes in shell to test avro 1.8.2
   ```
   --conf spark.driver.userClassPathFirst=true 
   --conf spark.executor.userClassPathFirst=true
   ```
   OR do some change like hive dependence in spark-bundle 
   ```
   
   org.apache.avro
   com.apache.hudi.org.apache.avro
   
   ```
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] cdmikechen commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-17 Thread GitBox
cdmikechen commented on issue #770: remove com.databricks:spark-avro to build 
spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532471718
 
 
   @vinothchandar 
   The jar of avro 1.7.7 under spark can be directly replaced by 1.8.2. I have 
tested some of codes and proved that direct replacement of jar is a feasible 
method. In most cases, the method of avro1.8.2 is compatible with avro1.7.7.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] cdmikechen commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-17 Thread GitBox
cdmikechen commented on issue #770: remove com.databricks:spark-avro to build 
spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532470332
 
 
   @umehrot2 
   In addition to the decimal problem, I also modified a timestamp conversion 
problem. 
   On spark dataset, this PR get the right result. But there are still some 
problems on Hive and sparksql. Hive 2.3 does not correctly identify the 
logical-type in parquet-avro file, timestamp type may be cast to long type in 
Hive 2.3.
   I modified some of Hive's source in 
`org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableTimestampObjectInspector`
 code to solve this problem.
   ```
   package org.apache.hadoop.hive.serde2.objectinspector.primitive;
   
   import java.sql.Timestamp;
   
   import org.apache.hadoop.hive.serde2.io.TimestampWritable;
   import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
   import org.apache.hadoop.io.LongWritable;
   
   public class WritableTimestampObjectInspector extends
   AbstractPrimitiveWritableObjectInspector implements
   SettableTimestampObjectInspector {
   
 public WritableTimestampObjectInspector() {
   super(TypeInfoFactory.timestampTypeInfo);
 }
   
 @Override
 public TimestampWritable getPrimitiveWritableObject(Object o) {
   if (o instanceof LongWritable) {
 return (TimestampWritable) 
PrimitiveObjectInspectorFactory.writableTimestampObjectInspector
 .create(new Timestamp(((LongWritable) o).get()));
   }
   return o == null ? null : (TimestampWritable) o;
 }
   
 public Timestamp getPrimitiveJavaObject(Object o) {
   if (o instanceof LongWritable) {
   return new Timestamp(((LongWritable) o).get());
   }
   return o == null ? null : ((TimestampWritable) o).getTimestamp();
 }
   
 public Object copyObject(Object o) {
   if (o instanceof LongWritable) {
   return new TimestampWritable(new Timestamp(((LongWritable) 
o).get()));
   }
   return o == null ? null : new TimestampWritable((TimestampWritable) o);
 }
   
 public Object set(Object o, byte[] bytes, int offset) {
   if (o instanceof LongWritable) {
 o = PrimitiveObjectInspectorFactory.writableTimestampObjectInspector
 .create(new Timestamp(((LongWritable) o).get()));
   } else
   ((TimestampWritable) o).set(bytes, offset);
   return o;
 }
   
 public Object set(Object o, Timestamp t) {
   if (t == null) {
 return null;
   }
   if (o instanceof LongWritable) {
 o = 
PrimitiveObjectInspectorFactory.writableTimestampObjectInspector.create(t);
   } else
   ((TimestampWritable) o).set(t);
   return o;
 }
   
 public Object set(Object o, TimestampWritable t) {
   if (t == null) {
 return null;
   }
   if (o instanceof LongWritable) {
 o = PrimitiveObjectInspectorFactory.writableTimestampObjectInspector
 .create(new Timestamp(((LongWritable) o).get()));
   } else
   ((TimestampWritable) o).set(t);
   return o;
 }
   
 public Object create(byte[] bytes, int offset) {
   return new TimestampWritable(bytes, offset);
 }
   
 public Object create(Timestamp t) {
   return new TimestampWritable(t);
 }
   }
   ```
   I'm looking for a solution that doesn't need to modify the hive source code. 
See if you can come up with any good ideas.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] cdmikechen commented on issue #770: remove com.databricks:spark-avro to build spark avro schema by itself

2019-09-16 Thread GitBox
cdmikechen commented on issue #770: remove com.databricks:spark-avro to build 
spark avro schema by itself
URL: https://github.com/apache/incubator-hudi/pull/770#issuecomment-532039500
 
 
   @umehrot2 It is base in 0.4.8 version, You also need to upgrade avro to 
1.8.2 or higher version (support logicaltype), and parquet 1.8.2 or higher. 
   My current project is using hoodie 0.4.8. There are still some problems to 
be adjusted in my project, so I haven't made any improvements yet. By October, 
I will adjust the PR code based on version 0.5.0.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services