GitHub user vinodkc opened a pull request:
https://github.com/apache/spark/pull/19779
[SPARK-17920][SPARK-19580][SPARK-19878][SQL] Support writing to Hive table
which uses Avro schema url 'avro.schema.url'
## What changes were proposed in this pull request?
Support writing to Hive table which uses Avro schema url 'avro.schema.url'
For ex:
create external table avro_in (a string) stored as avro location
'/avro-in/' tblproperties ('avro.schema.url'='/avro-schema/avro.avsc');
create external table avro_out (a string) stored as avro location
'/avro-out/' tblproperties ('avro.schema.url'='/avro-schema/avro.avsc');
insert overwrite table avro_out select * from avro_in; // fails with
java.lang.NullPointerException
WARN AvroSerDe: Encountered exception determining schema. Returning signal
schema to indicate problem
java.lang.NullPointerException
at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:182)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:174)
## Changes proposed in this fix
Currently 'null' value is passed to serializer, which causes NPE during
insert operation, instead pass Hadoop configuration object
## How was this patch tested?
Added new test case in VersionsSuite
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/vinodkc/spark br_Fix_SPARK-17920
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19779.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19779
----
commit 034b2466d073c008b71eae072ee98353df56cbf2
Author: vinodkc <[email protected]>
Date: 2017-11-18T07:52:59Z
pass hadoopConfiguration to Serializer
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]