[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table

2013-08-06 Thread Alexey Zotov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13730501#comment-13730501
 ] 

Alexey Zotov commented on HIVE-3442:


I found a better approach. We need to specify the following _avro.schema.url_ 
_hdfs:///some/path/schema.json_ to both _TBLPROPERTIES and _SERDEPROPERTIES_. 
It works well with NameNodes HA mode.

> AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating 
> external table
> ---
>
> Key: HIVE-3442
> URL: https://issues.apache.org/jira/browse/HIVE-3442
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Zhenxiao Luo
>Assignee: Zhenxiao Luo
> Fix For: 0.10.0
>
>
> After creating a table and load data into it, I could check that the table is 
> created successfully, and data is inside:
> DROP TABLE IF EXISTS ml_items;
> CREATE TABLE ml_items(id INT,
>   title STRING,
>   release_date STRING,
>   video_release_date STRING,
>   imdb_url STRING,
>   unknown_genre TINYINT,
>   action TINYINT,
>   adventure TINYINT,
>   animation TINYINT,
>   children TINYINT,
>   comedy TINYINT,
>   crime TINYINT,
>   documentary TINYINT,
>   drama TINYINT,
>   fantasy TINYINT,
>   film_noir TINYINT,
>   horror TINYINT,
>   musical TINYINT,
>   mystery TINYINT,
>   romance TINYINT,
>   sci_fi TINYINT,
>   thriller TINYINT,
>   war TINYINT,
>   western TINYINT)
>   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
>   STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
> select * from ml_items ORDER BY id ASC;
> While, the following create external table with AvroSerDe is not working:
> DROP TABLE IF EXISTS ml_items_as_avro;
> CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
> describe ml_items_as_avro;
> INSERT OVERWRITE TABLE ml_items_as_avro
>   SELECT id, title,
> imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
> crime,
> documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
> sci_fi, thriller, war, western
>   FROM ml_items;
> ml_items_as_avro is not created with expected schema, as shown in the 
> "describe ml_items_as_avro" output. The output is below:
> PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> PREHOOK: type: DROPTABLE
> POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> POSTHOOK: type: DROPTABLE
> PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> PREHOOK: type: CREATETABLE
> POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> POSTHOOK: type: CREATETABLE
> POSTHOOK: Output: default@ml_items_as_avro
> PREHOOK: query: describe ml_items_as_avro
> PREHOOK: type: DESCTABLE
> POSTHOOK: query: describe ml_items_as_avro
> POSTHOOK: type: DESCTABLE
> error_error_error_error_error_error_error   string  from deserializer
> cannot_determine_schema string  from deserializer
> check   string  from deserializer
> schema  string  from deserializer
> url string  from deserializer
> and string  from deserializer
> literal string  from deserial

[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table

2013-07-31 Thread Alexey Zotov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13724946#comment-13724946
 ] 

Alexey Zotov commented on HIVE-3442:


Yep, I can add this info to 
https://cwiki.apache.org/confluence/display/Hive/AvroSerDe page. But proposed 
approach have a defect: if DataNode (some_datanode_address:50075) is down you 
won't have an ability to query data from Hive. I'm working on improvement of 
this approach.


> AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating 
> external table
> ---
>
> Key: HIVE-3442
> URL: https://issues.apache.org/jira/browse/HIVE-3442
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Zhenxiao Luo
>Assignee: Zhenxiao Luo
> Fix For: 0.10.0
>
>
> After creating a table and load data into it, I could check that the table is 
> created successfully, and data is inside:
> DROP TABLE IF EXISTS ml_items;
> CREATE TABLE ml_items(id INT,
>   title STRING,
>   release_date STRING,
>   video_release_date STRING,
>   imdb_url STRING,
>   unknown_genre TINYINT,
>   action TINYINT,
>   adventure TINYINT,
>   animation TINYINT,
>   children TINYINT,
>   comedy TINYINT,
>   crime TINYINT,
>   documentary TINYINT,
>   drama TINYINT,
>   fantasy TINYINT,
>   film_noir TINYINT,
>   horror TINYINT,
>   musical TINYINT,
>   mystery TINYINT,
>   romance TINYINT,
>   sci_fi TINYINT,
>   thriller TINYINT,
>   war TINYINT,
>   western TINYINT)
>   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
>   STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
> select * from ml_items ORDER BY id ASC;
> While, the following create external table with AvroSerDe is not working:
> DROP TABLE IF EXISTS ml_items_as_avro;
> CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
> describe ml_items_as_avro;
> INSERT OVERWRITE TABLE ml_items_as_avro
>   SELECT id, title,
> imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
> crime,
> documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
> sci_fi, thriller, war, western
>   FROM ml_items;
> ml_items_as_avro is not created with expected schema, as shown in the 
> "describe ml_items_as_avro" output. The output is below:
> PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> PREHOOK: type: DROPTABLE
> POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> POSTHOOK: type: DROPTABLE
> PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> PREHOOK: type: CREATETABLE
> POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> POSTHOOK: type: CREATETABLE
> POSTHOOK: Output: default@ml_items_as_avro
> PREHOOK: query: describe ml_items_as_avro
> PREHOOK: type: DESCTABLE
> POSTHOOK: query: describe ml_items_as_avro
> POSTHOOK: type: DESCTABLE
> error_error_error_error_error_error_error   string  from deserializer
> cannot_determine_schema string  from deserializer
> check   string  from deserializer
> schema  string  from deserializer
> url stri

[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table

2013-07-29 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13722688#comment-13722688
 ] 

Swarnim Kulkarni commented on HIVE-3442:


[~jakobhoman] Would this be specific to AvroSerDe? Wouldn't it apply to 
anything that needs the users to specify a path on HDFS(LOCATION for EXTERNAL 
TABLE for ex)?

> AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating 
> external table
> ---
>
> Key: HIVE-3442
> URL: https://issues.apache.org/jira/browse/HIVE-3442
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Zhenxiao Luo
>Assignee: Zhenxiao Luo
> Fix For: 0.10.0
>
>
> After creating a table and load data into it, I could check that the table is 
> created successfully, and data is inside:
> DROP TABLE IF EXISTS ml_items;
> CREATE TABLE ml_items(id INT,
>   title STRING,
>   release_date STRING,
>   video_release_date STRING,
>   imdb_url STRING,
>   unknown_genre TINYINT,
>   action TINYINT,
>   adventure TINYINT,
>   animation TINYINT,
>   children TINYINT,
>   comedy TINYINT,
>   crime TINYINT,
>   documentary TINYINT,
>   drama TINYINT,
>   fantasy TINYINT,
>   film_noir TINYINT,
>   horror TINYINT,
>   musical TINYINT,
>   mystery TINYINT,
>   romance TINYINT,
>   sci_fi TINYINT,
>   thriller TINYINT,
>   war TINYINT,
>   western TINYINT)
>   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
>   STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
> select * from ml_items ORDER BY id ASC;
> While, the following create external table with AvroSerDe is not working:
> DROP TABLE IF EXISTS ml_items_as_avro;
> CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
> describe ml_items_as_avro;
> INSERT OVERWRITE TABLE ml_items_as_avro
>   SELECT id, title,
> imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
> crime,
> documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
> sci_fi, thriller, war, western
>   FROM ml_items;
> ml_items_as_avro is not created with expected schema, as shown in the 
> "describe ml_items_as_avro" output. The output is below:
> PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> PREHOOK: type: DROPTABLE
> POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> POSTHOOK: type: DROPTABLE
> PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> PREHOOK: type: CREATETABLE
> POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> POSTHOOK: type: CREATETABLE
> POSTHOOK: Output: default@ml_items_as_avro
> PREHOOK: query: describe ml_items_as_avro
> PREHOOK: type: DESCTABLE
> POSTHOOK: query: describe ml_items_as_avro
> POSTHOOK: type: DESCTABLE
> error_error_error_error_error_error_error   string  from deserializer
> cannot_determine_schema string  from deserializer
> check   string  from deserializer
> schema  string  from deserializer
> url string  from deserializer
> and string  from deserializer
> literal string  from deserializer
> FAILED: Semant

[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table

2013-07-29 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13722670#comment-13722670
 ] 

Jakob Homan commented on HIVE-3442:
---

The best place for this info would be in the Hive wiki, which is what passes 
for official project documentation: 
https://cwiki.apache.org/confluence/display/Hive/AvroSerDe  Please ping the 
user list if you don't have write access to the page.  Thanks for finding this 
out and sharing.

> AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating 
> external table
> ---
>
> Key: HIVE-3442
> URL: https://issues.apache.org/jira/browse/HIVE-3442
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Zhenxiao Luo
>Assignee: Zhenxiao Luo
> Fix For: 0.10.0
>
>
> After creating a table and load data into it, I could check that the table is 
> created successfully, and data is inside:
> DROP TABLE IF EXISTS ml_items;
> CREATE TABLE ml_items(id INT,
>   title STRING,
>   release_date STRING,
>   video_release_date STRING,
>   imdb_url STRING,
>   unknown_genre TINYINT,
>   action TINYINT,
>   adventure TINYINT,
>   animation TINYINT,
>   children TINYINT,
>   comedy TINYINT,
>   crime TINYINT,
>   documentary TINYINT,
>   drama TINYINT,
>   fantasy TINYINT,
>   film_noir TINYINT,
>   horror TINYINT,
>   musical TINYINT,
>   mystery TINYINT,
>   romance TINYINT,
>   sci_fi TINYINT,
>   thriller TINYINT,
>   war TINYINT,
>   western TINYINT)
>   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
>   STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
> select * from ml_items ORDER BY id ASC;
> While, the following create external table with AvroSerDe is not working:
> DROP TABLE IF EXISTS ml_items_as_avro;
> CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
> describe ml_items_as_avro;
> INSERT OVERWRITE TABLE ml_items_as_avro
>   SELECT id, title,
> imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
> crime,
> documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
> sci_fi, thriller, war, western
>   FROM ml_items;
> ml_items_as_avro is not created with expected schema, as shown in the 
> "describe ml_items_as_avro" output. The output is below:
> PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> PREHOOK: type: DROPTABLE
> POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> POSTHOOK: type: DROPTABLE
> PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> PREHOOK: type: CREATETABLE
> POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> POSTHOOK: type: CREATETABLE
> POSTHOOK: Output: default@ml_items_as_avro
> PREHOOK: query: describe ml_items_as_avro
> PREHOOK: type: DESCTABLE
> POSTHOOK: query: describe ml_items_as_avro
> POSTHOOK: type: DESCTABLE
> error_error_error_error_error_error_error   string  from deserializer
> cannot_determine_schema string  from deserializer
> check   string  from deserializer
> schema  string  from deserializer
> ur

[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table

2013-07-29 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13722643#comment-13722643
 ] 

Swarnim Kulkarni commented on HIVE-3442:


[~azotcsit] This seems like useful information. Would you mind doing a post 
about it on the hive users group for a larger audience? I am sure it will be 
much appreciated. Thanks!

> AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating 
> external table
> ---
>
> Key: HIVE-3442
> URL: https://issues.apache.org/jira/browse/HIVE-3442
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Zhenxiao Luo
>Assignee: Zhenxiao Luo
> Fix For: 0.10.0
>
>
> After creating a table and load data into it, I could check that the table is 
> created successfully, and data is inside:
> DROP TABLE IF EXISTS ml_items;
> CREATE TABLE ml_items(id INT,
>   title STRING,
>   release_date STRING,
>   video_release_date STRING,
>   imdb_url STRING,
>   unknown_genre TINYINT,
>   action TINYINT,
>   adventure TINYINT,
>   animation TINYINT,
>   children TINYINT,
>   comedy TINYINT,
>   crime TINYINT,
>   documentary TINYINT,
>   drama TINYINT,
>   fantasy TINYINT,
>   film_noir TINYINT,
>   horror TINYINT,
>   musical TINYINT,
>   mystery TINYINT,
>   romance TINYINT,
>   sci_fi TINYINT,
>   thriller TINYINT,
>   war TINYINT,
>   western TINYINT)
>   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
>   STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
> select * from ml_items ORDER BY id ASC;
> While, the following create external table with AvroSerDe is not working:
> DROP TABLE IF EXISTS ml_items_as_avro;
> CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
> describe ml_items_as_avro;
> INSERT OVERWRITE TABLE ml_items_as_avro
>   SELECT id, title,
> imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
> crime,
> documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
> sci_fi, thriller, war, western
>   FROM ml_items;
> ml_items_as_avro is not created with expected schema, as shown in the 
> "describe ml_items_as_avro" output. The output is below:
> PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> PREHOOK: type: DROPTABLE
> POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> POSTHOOK: type: DROPTABLE
> PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> PREHOOK: type: CREATETABLE
> POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> POSTHOOK: type: CREATETABLE
> POSTHOOK: Output: default@ml_items_as_avro
> PREHOOK: query: describe ml_items_as_avro
> PREHOOK: type: DESCTABLE
> POSTHOOK: query: describe ml_items_as_avro
> POSTHOOK: type: DESCTABLE
> error_error_error_error_error_error_error   string  from deserializer
> cannot_determine_schema string  from deserializer
> check   string  from deserializer
> schema  string  from deserializer
> url string  from deserializer
> and string  from deserializer
> literal string  from deserializer
> 

[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table

2013-07-29 Thread Alexey Zotov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13722238#comment-13722238
 ] 

Alexey Zotov commented on HIVE-3442:


Please remove my previous comment.

I had a problem with HA mode for NameNodes (_namenode1_address_ and 
_namenode2_address_). At first I specified the following url as avro.schema.url:
{noformat}
http://some_datanode_address:50075/streamFile/path/to/file/schema.json?nnaddr=namenode1_address:8020
{noformat}
But I couldn't get data from Hive when _namenode1_address_ was StandByNode. So, 
had to change the link manually. 

After some time, I found how to fix it. I want to post it here and I hope it 
will help to someone:
{noformat}
http://some_datanode_address:50075/streamFile/path/to/file/schema.json?nnaddr=nameservice1:8020';
{noformat}
So, for NameNode's HA mode you can specify nameservice instead of an active 
NameNode.

> AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating 
> external table
> ---
>
> Key: HIVE-3442
> URL: https://issues.apache.org/jira/browse/HIVE-3442
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Zhenxiao Luo
>Assignee: Zhenxiao Luo
> Fix For: 0.10.0
>
>
> After creating a table and load data into it, I could check that the table is 
> created successfully, and data is inside:
> DROP TABLE IF EXISTS ml_items;
> CREATE TABLE ml_items(id INT,
>   title STRING,
>   release_date STRING,
>   video_release_date STRING,
>   imdb_url STRING,
>   unknown_genre TINYINT,
>   action TINYINT,
>   adventure TINYINT,
>   animation TINYINT,
>   children TINYINT,
>   comedy TINYINT,
>   crime TINYINT,
>   documentary TINYINT,
>   drama TINYINT,
>   fantasy TINYINT,
>   film_noir TINYINT,
>   horror TINYINT,
>   musical TINYINT,
>   mystery TINYINT,
>   romance TINYINT,
>   sci_fi TINYINT,
>   thriller TINYINT,
>   war TINYINT,
>   western TINYINT)
>   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
>   STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
> select * from ml_items ORDER BY id ASC;
> While, the following create external table with AvroSerDe is not working:
> DROP TABLE IF EXISTS ml_items_as_avro;
> CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
> describe ml_items_as_avro;
> INSERT OVERWRITE TABLE ml_items_as_avro
>   SELECT id, title,
> imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
> crime,
> documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
> sci_fi, thriller, war, western
>   FROM ml_items;
> ml_items_as_avro is not created with expected schema, as shown in the 
> "describe ml_items_as_avro" output. The output is below:
> PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> PREHOOK: type: DROPTABLE
> POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> POSTHOOK: type: DROPTABLE
> PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> PREHOOK: type: CREATETABLE
> POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/bu

[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table

2013-07-29 Thread Alexey Zotov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13722235#comment-13722235
 ] 

Alexey Zotov commented on HIVE-3442:


I have a problem with HA mode for NameNodes. When I specified the following url 
as _avro.schema.url_:
{noformat}

For NameNode's HA mode you can specify nameservice instead of an active 
NameNode as _avro.schema.url_:
{noformat}
http://some_datanode_address:50075/streamFile/path/to/file/schema.json?nnaddr=nameservice1:8020';
{noformat}

> AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating 
> external table
> ---
>
> Key: HIVE-3442
> URL: https://issues.apache.org/jira/browse/HIVE-3442
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Zhenxiao Luo
>Assignee: Zhenxiao Luo
> Fix For: 0.10.0
>
>
> After creating a table and load data into it, I could check that the table is 
> created successfully, and data is inside:
> DROP TABLE IF EXISTS ml_items;
> CREATE TABLE ml_items(id INT,
>   title STRING,
>   release_date STRING,
>   video_release_date STRING,
>   imdb_url STRING,
>   unknown_genre TINYINT,
>   action TINYINT,
>   adventure TINYINT,
>   animation TINYINT,
>   children TINYINT,
>   comedy TINYINT,
>   crime TINYINT,
>   documentary TINYINT,
>   drama TINYINT,
>   fantasy TINYINT,
>   film_noir TINYINT,
>   horror TINYINT,
>   musical TINYINT,
>   mystery TINYINT,
>   romance TINYINT,
>   sci_fi TINYINT,
>   thriller TINYINT,
>   war TINYINT,
>   western TINYINT)
>   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
>   STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
> select * from ml_items ORDER BY id ASC;
> While, the following create external table with AvroSerDe is not working:
> DROP TABLE IF EXISTS ml_items_as_avro;
> CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
> describe ml_items_as_avro;
> INSERT OVERWRITE TABLE ml_items_as_avro
>   SELECT id, title,
> imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
> crime,
> documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
> sci_fi, thriller, war, western
>   FROM ml_items;
> ml_items_as_avro is not created with expected schema, as shown in the 
> "describe ml_items_as_avro" output. The output is below:
> PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> PREHOOK: type: DROPTABLE
> POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> POSTHOOK: type: DROPTABLE
> PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> PREHOOK: type: CREATETABLE
> POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> POSTHOOK: type: CREATETABLE
> POSTHOOK: Output: default@ml_items_as_avro
> PREHOOK: query: describe ml_items_as_avro
> PREHOOK: type: DESCTABLE
> POSTHOOK: query: describe ml_items_as_avro
> POSTHOOK: type: DESCTABLE
> error_error_error_error_error_error_error   string  from deserializer
> cannot_determine_schema string  from deserializer
> check   string  from d

[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table

2012-09-06 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450217#comment-13450217
 ] 

Jakob Homan commented on HIVE-3442:
---

Sounds good.  

> AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating 
> external table
> ---
>
> Key: HIVE-3442
> URL: https://issues.apache.org/jira/browse/HIVE-3442
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Zhenxiao Luo
>Assignee: Zhenxiao Luo
> Fix For: 0.10.0
>
>
> After creating a table and load data into it, I could check that the table is 
> created successfully, and data is inside:
> DROP TABLE IF EXISTS ml_items;
> CREATE TABLE ml_items(id INT,
>   title STRING,
>   release_date STRING,
>   video_release_date STRING,
>   imdb_url STRING,
>   unknown_genre TINYINT,
>   action TINYINT,
>   adventure TINYINT,
>   animation TINYINT,
>   children TINYINT,
>   comedy TINYINT,
>   crime TINYINT,
>   documentary TINYINT,
>   drama TINYINT,
>   fantasy TINYINT,
>   film_noir TINYINT,
>   horror TINYINT,
>   musical TINYINT,
>   mystery TINYINT,
>   romance TINYINT,
>   sci_fi TINYINT,
>   thriller TINYINT,
>   war TINYINT,
>   western TINYINT)
>   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
>   STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
> select * from ml_items ORDER BY id ASC;
> While, the following create external table with AvroSerDe is not working:
> DROP TABLE IF EXISTS ml_items_as_avro;
> CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
> describe ml_items_as_avro;
> INSERT OVERWRITE TABLE ml_items_as_avro
>   SELECT id, title,
> imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
> crime,
> documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
> sci_fi, thriller, war, western
>   FROM ml_items;
> ml_items_as_avro is not created with expected schema, as shown in the 
> "describe ml_items_as_avro" output. The output is below:
> PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> PREHOOK: type: DROPTABLE
> POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> POSTHOOK: type: DROPTABLE
> PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> PREHOOK: type: CREATETABLE
> POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> POSTHOOK: type: CREATETABLE
> POSTHOOK: Output: default@ml_items_as_avro
> PREHOOK: query: describe ml_items_as_avro
> PREHOOK: type: DESCTABLE
> POSTHOOK: query: describe ml_items_as_avro
> POSTHOOK: type: DESCTABLE
> error_error_error_error_error_error_error   string  from deserializer
> cannot_determine_schema string  from deserializer
> check   string  from deserializer
> schema  string  from deserializer
> url string  from deserializer
> and string  from deserializer
> literal string  from deserializer
> FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target 
> table because column number/types are different 'ml_items_as_avro': Table 
> insclause-0 has 7 c

[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table

2012-09-06 Thread Zhenxiao Luo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450215#comment-13450215
 ] 

Zhenxiao Luo commented on HIVE-3442:


@Jakob:

Thanks a lot. Get it working with the following valid URL:

DROP TABLE IF EXISTS ml_items_as_avro;
CREATE EXTERNAL TABLE ml_items_as_avro
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  WITH SERDEPROPERTIES (

'avro.schema.url'='file:${system:test.src.data.dir}/files/avro_items_schema.avsc')
  STORED as INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';

describe ml_items_as_avro;

INSERT OVERWRITE TABLE ml_items_as_avro
  SELECT id, title,
imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
crime,
documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
sci_fi, thriller, war, western
  FROM ml_items;

How about I resolve this as Not A Bug?

> AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating 
> external table
> ---
>
> Key: HIVE-3442
> URL: https://issues.apache.org/jira/browse/HIVE-3442
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Zhenxiao Luo
>Assignee: Zhenxiao Luo
> Fix For: 0.10.0
>
>
> After creating a table and load data into it, I could check that the table is 
> created successfully, and data is inside:
> DROP TABLE IF EXISTS ml_items;
> CREATE TABLE ml_items(id INT,
>   title STRING,
>   release_date STRING,
>   video_release_date STRING,
>   imdb_url STRING,
>   unknown_genre TINYINT,
>   action TINYINT,
>   adventure TINYINT,
>   animation TINYINT,
>   children TINYINT,
>   comedy TINYINT,
>   crime TINYINT,
>   documentary TINYINT,
>   drama TINYINT,
>   fantasy TINYINT,
>   film_noir TINYINT,
>   horror TINYINT,
>   musical TINYINT,
>   mystery TINYINT,
>   romance TINYINT,
>   sci_fi TINYINT,
>   thriller TINYINT,
>   war TINYINT,
>   western TINYINT)
>   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
>   STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
> select * from ml_items ORDER BY id ASC;
> While, the following create external table with AvroSerDe is not working:
> DROP TABLE IF EXISTS ml_items_as_avro;
> CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
> describe ml_items_as_avro;
> INSERT OVERWRITE TABLE ml_items_as_avro
>   SELECT id, title,
> imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
> crime,
> documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
> sci_fi, thriller, war, western
>   FROM ml_items;
> ml_items_as_avro is not created with expected schema, as shown in the 
> "describe ml_items_as_avro" output. The output is below:
> PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> PREHOOK: type: DROPTABLE
> POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> POSTHOOK: type: DROPTABLE
> PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> PREHOOK: type: CREATETABLE
> POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.

[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table

2012-09-06 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450209#comment-13450209
 ] 

Jakob Homan commented on HIVE-3442:
---

bq. 
'avro.schema.literal'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc'
Is this a valid URL? Is it accessible from the metastore?

> AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating 
> external table
> ---
>
> Key: HIVE-3442
> URL: https://issues.apache.org/jira/browse/HIVE-3442
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Zhenxiao Luo
>Assignee: Zhenxiao Luo
> Fix For: 0.10.0
>
>
> After creating a table and load data into it, I could check that the table is 
> created successfully, and data is inside:
> DROP TABLE IF EXISTS ml_items;
> CREATE TABLE ml_items(id INT,
>   title STRING,
>   release_date STRING,
>   video_release_date STRING,
>   imdb_url STRING,
>   unknown_genre TINYINT,
>   action TINYINT,
>   adventure TINYINT,
>   animation TINYINT,
>   children TINYINT,
>   comedy TINYINT,
>   crime TINYINT,
>   documentary TINYINT,
>   drama TINYINT,
>   fantasy TINYINT,
>   film_noir TINYINT,
>   horror TINYINT,
>   musical TINYINT,
>   mystery TINYINT,
>   romance TINYINT,
>   sci_fi TINYINT,
>   thriller TINYINT,
>   war TINYINT,
>   western TINYINT)
>   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
>   STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
> select * from ml_items ORDER BY id ASC;
> While, the following create external table with AvroSerDe is not working:
> DROP TABLE IF EXISTS ml_items_as_avro;
> CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
> describe ml_items_as_avro;
> INSERT OVERWRITE TABLE ml_items_as_avro
>   SELECT id, title,
> imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
> crime,
> documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
> sci_fi, thriller, war, western
>   FROM ml_items;
> ml_items_as_avro is not created with expected schema, as shown in the 
> "describe ml_items_as_avro" output. The output is below:
> PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> PREHOOK: type: DROPTABLE
> POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> POSTHOOK: type: DROPTABLE
> PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> PREHOOK: type: CREATETABLE
> POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> POSTHOOK: type: CREATETABLE
> POSTHOOK: Output: default@ml_items_as_avro
> PREHOOK: query: describe ml_items_as_avro
> PREHOOK: type: DESCTABLE
> POSTHOOK: query: describe ml_items_as_avro
> POSTHOOK: type: DESCTABLE
> error_error_error_error_error_error_error   string  from deserializer
> cannot_determine_schema string  from deserializer
> check   string  from deserializer
> schema  string  from deserializer
> url string  from deserializer
> and string  from deserializer
> literal string  from deserializer
> FAILED: SemanticException [Error 10044]: Line

[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table

2012-09-06 Thread Zhenxiao Luo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450206#comment-13450206
 ] 

Zhenxiao Luo commented on HIVE-3442:


Also tried avro.schema.literal, seems not working:


PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
PREHOOK: type: DROPTABLE
POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
POSTHOOK: type: DROPTABLE
PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  WITH SERDEPROPERTIES (

'avro.schema.literal'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
  STORED as INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
PREHOOK: type: CREATETABLE
POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  WITH SERDEPROPERTIES (

'avro.schema.literal'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
  STORED as INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
POSTHOOK: type: CREATETABLE
POSTHOOK: Output: default@ml_items_as_avro
PREHOOK: query: describe ml_items_as_avro
PREHOOK: type: DESCTABLE
POSTHOOK: query: describe ml_items_as_avro
POSTHOOK: type: DESCTABLE
error_error_error_error_error_error_error   string  from deserializer
cannot_determine_schema string  from deserializer
check   string  from deserializer
schema  string  from deserializer
url string  from deserializer
and string  from deserializer
literal string  from deserializer
FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target 
table because column number/types are different 'ml_items_as_avro': Table 
insclause-0 has 7 columns, but query has 22 columns.

@Jakob:

I will trace the code to see what is wrong. Any comments are appreciated.

> AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating 
> external table
> ---
>
> Key: HIVE-3442
> URL: https://issues.apache.org/jira/browse/HIVE-3442
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Zhenxiao Luo
>Assignee: Zhenxiao Luo
> Fix For: 0.10.0
>
>
> After creating a table and load data into it, I could check that the table is 
> created successfully, and data is inside:
> DROP TABLE IF EXISTS ml_items;
> CREATE TABLE ml_items(id INT,
>   title STRING,
>   release_date STRING,
>   video_release_date STRING,
>   imdb_url STRING,
>   unknown_genre TINYINT,
>   action TINYINT,
>   adventure TINYINT,
>   animation TINYINT,
>   children TINYINT,
>   comedy TINYINT,
>   crime TINYINT,
>   documentary TINYINT,
>   drama TINYINT,
>   fantasy TINYINT,
>   film_noir TINYINT,
>   horror TINYINT,
>   musical TINYINT,
>   mystery TINYINT,
>   romance TINYINT,
>   sci_fi TINYINT,
>   thriller TINYINT,
>   war TINYINT,
>   western TINYINT)
>   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
>   STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
> select * from ml_items ORDER BY id ASC;
> While, the following create external table with AvroSerDe is not working:
> DROP TABLE IF EXISTS ml_items_as_avro;
> CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
> describe ml_items_as_avro;
> INSERT OVERWRITE TABLE ml_items_as_avro
>   SELECT id, title,
> imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
> crime,
> documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
> sci_fi, thriller, war, western
>   FROM ml_items;
> ml_items_as_avro is not created wi

[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table

2012-09-06 Thread Zhenxiao Luo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450203#comment-13450203
 ] 

Zhenxiao Luo commented on HIVE-3442:


@Jakob:

Thanks a lot. I tried avro.schema.url, seems still not working:

PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
PREHOOK: type: DROPTABLE
POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
POSTHOOK: type: DROPTABLE
PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  WITH SERDEPROPERTIES (

'avro.schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
  STORED as INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
PREHOOK: type: CREATETABLE
POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  WITH SERDEPROPERTIES (

'avro.schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
  STORED as INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
POSTHOOK: type: CREATETABLE
POSTHOOK: Output: default@ml_items_as_avro
PREHOOK: query: describe ml_items_as_avro
PREHOOK: type: DESCTABLE
POSTHOOK: query: describe ml_items_as_avro
POSTHOOK: type: DESCTABLE
error_error_error_error_error_error_error   string  from deserializer
cannot_determine_schema string  from deserializer
check   string  from deserializer
schema  string  from deserializer
url string  from deserializer
and string  from deserializer
literal string  from deserializer
FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target 
table because column number/types are different 'ml_items_as_avro': Table 
insclause-0 has 7 columns, but query has 22 columns.


> AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating 
> external table
> ---
>
> Key: HIVE-3442
> URL: https://issues.apache.org/jira/browse/HIVE-3442
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Zhenxiao Luo
>Assignee: Zhenxiao Luo
> Fix For: 0.10.0
>
>
> After creating a table and load data into it, I could check that the table is 
> created successfully, and data is inside:
> DROP TABLE IF EXISTS ml_items;
> CREATE TABLE ml_items(id INT,
>   title STRING,
>   release_date STRING,
>   video_release_date STRING,
>   imdb_url STRING,
>   unknown_genre TINYINT,
>   action TINYINT,
>   adventure TINYINT,
>   animation TINYINT,
>   children TINYINT,
>   comedy TINYINT,
>   crime TINYINT,
>   documentary TINYINT,
>   drama TINYINT,
>   fantasy TINYINT,
>   film_noir TINYINT,
>   horror TINYINT,
>   musical TINYINT,
>   mystery TINYINT,
>   romance TINYINT,
>   sci_fi TINYINT,
>   thriller TINYINT,
>   war TINYINT,
>   western TINYINT)
>   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
>   STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
> select * from ml_items ORDER BY id ASC;
> While, the following create external table with AvroSerDe is not working:
> DROP TABLE IF EXISTS ml_items_as_avro;
> CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
> describe ml_items_as_avro;
> INSERT OVERWRITE TABLE ml_items_as_avro
>   SELECT id, title,
> imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
> crime,
> documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
> sci_fi, thriller, war, western
>   FROM ml_items;
> ml_items_as_avro is not created with expected schema, as shown in the 
> "describe ml_items_as_avro" out

[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table

2012-09-06 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450202#comment-13450202
 ] 

Jakob Homan commented on HIVE-3442:
---

updated the wiki.

> AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating 
> external table
> ---
>
> Key: HIVE-3442
> URL: https://issues.apache.org/jira/browse/HIVE-3442
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Zhenxiao Luo
>Assignee: Zhenxiao Luo
> Fix For: 0.10.0
>
>
> After creating a table and load data into it, I could check that the table is 
> created successfully, and data is inside:
> DROP TABLE IF EXISTS ml_items;
> CREATE TABLE ml_items(id INT,
>   title STRING,
>   release_date STRING,
>   video_release_date STRING,
>   imdb_url STRING,
>   unknown_genre TINYINT,
>   action TINYINT,
>   adventure TINYINT,
>   animation TINYINT,
>   children TINYINT,
>   comedy TINYINT,
>   crime TINYINT,
>   documentary TINYINT,
>   drama TINYINT,
>   fantasy TINYINT,
>   film_noir TINYINT,
>   horror TINYINT,
>   musical TINYINT,
>   mystery TINYINT,
>   romance TINYINT,
>   sci_fi TINYINT,
>   thriller TINYINT,
>   war TINYINT,
>   western TINYINT)
>   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
>   STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
> select * from ml_items ORDER BY id ASC;
> While, the following create external table with AvroSerDe is not working:
> DROP TABLE IF EXISTS ml_items_as_avro;
> CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
> describe ml_items_as_avro;
> INSERT OVERWRITE TABLE ml_items_as_avro
>   SELECT id, title,
> imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
> crime,
> documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
> sci_fi, thriller, war, western
>   FROM ml_items;
> ml_items_as_avro is not created with expected schema, as shown in the 
> "describe ml_items_as_avro" output. The output is below:
> PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> PREHOOK: type: DROPTABLE
> POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> POSTHOOK: type: DROPTABLE
> PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> PREHOOK: type: CREATETABLE
> POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> POSTHOOK: type: CREATETABLE
> POSTHOOK: Output: default@ml_items_as_avro
> PREHOOK: query: describe ml_items_as_avro
> PREHOOK: type: DESCTABLE
> POSTHOOK: query: describe ml_items_as_avro
> POSTHOOK: type: DESCTABLE
> error_error_error_error_error_error_error   string  from deserializer
> cannot_determine_schema string  from deserializer
> check   string  from deserializer
> schema  string  from deserializer
> url string  from deserializer
> and string  from deserializer
> literal string  from deserializer
> FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target 
> table because column number/types are different 'ml_items_as_avro': Table 
> insclause-0 has 

[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table

2012-09-06 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450198#comment-13450198
 ] 

Jakob Homan commented on HIVE-3442:
---

The docs are out of date (my fault).  schema.url and schema.literal got changed 
to avro.schema.url and avro.schema.literal during the move to Apache, to be 
more specific to Avro.  Try with those.  I'll update the wiki.

> AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating 
> external table
> ---
>
> Key: HIVE-3442
> URL: https://issues.apache.org/jira/browse/HIVE-3442
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Zhenxiao Luo
>Assignee: Zhenxiao Luo
> Fix For: 0.10.0
>
>
> After creating a table and load data into it, I could check that the table is 
> created successfully, and data is inside:
> DROP TABLE IF EXISTS ml_items;
> CREATE TABLE ml_items(id INT,
>   title STRING,
>   release_date STRING,
>   video_release_date STRING,
>   imdb_url STRING,
>   unknown_genre TINYINT,
>   action TINYINT,
>   adventure TINYINT,
>   animation TINYINT,
>   children TINYINT,
>   comedy TINYINT,
>   crime TINYINT,
>   documentary TINYINT,
>   drama TINYINT,
>   fantasy TINYINT,
>   film_noir TINYINT,
>   horror TINYINT,
>   musical TINYINT,
>   mystery TINYINT,
>   romance TINYINT,
>   sci_fi TINYINT,
>   thriller TINYINT,
>   war TINYINT,
>   western TINYINT)
>   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
>   STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
> select * from ml_items ORDER BY id ASC;
> While, the following create external table with AvroSerDe is not working:
> DROP TABLE IF EXISTS ml_items_as_avro;
> CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
> describe ml_items_as_avro;
> INSERT OVERWRITE TABLE ml_items_as_avro
>   SELECT id, title,
> imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
> crime,
> documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
> sci_fi, thriller, war, western
>   FROM ml_items;
> ml_items_as_avro is not created with expected schema, as shown in the 
> "describe ml_items_as_avro" output. The output is below:
> PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> PREHOOK: type: DROPTABLE
> POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> POSTHOOK: type: DROPTABLE
> PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> PREHOOK: type: CREATETABLE
> POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> POSTHOOK: type: CREATETABLE
> POSTHOOK: Output: default@ml_items_as_avro
> PREHOOK: query: describe ml_items_as_avro
> PREHOOK: type: DESCTABLE
> POSTHOOK: query: describe ml_items_as_avro
> POSTHOOK: type: DESCTABLE
> error_error_error_error_error_error_error   string  from deserializer
> cannot_determine_schema string  from deserializer
> check   string  from deserializer
> schema  string  from deserializer
> url string  from deserializer
> and string  from deserializer
> litera

[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table

2012-09-06 Thread Zhenxiao Luo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450191#comment-13450191
 ] 

Zhenxiao Luo commented on HIVE-3442:


CC'd Jakob. So that if there is any AvroSerDe usage error, Jakob's comments and 
suggesions are always welcome.

> AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating 
> external table
> ---
>
> Key: HIVE-3442
> URL: https://issues.apache.org/jira/browse/HIVE-3442
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Zhenxiao Luo
>Assignee: Zhenxiao Luo
> Fix For: 0.10.0
>
>
> After creating a table and load data into it, I could check that the table is 
> created successfully, and data is inside:
> DROP TABLE IF EXISTS ml_items;
> CREATE TABLE ml_items(id INT,
>   title STRING,
>   release_date STRING,
>   video_release_date STRING,
>   imdb_url STRING,
>   unknown_genre TINYINT,
>   action TINYINT,
>   adventure TINYINT,
>   animation TINYINT,
>   children TINYINT,
>   comedy TINYINT,
>   crime TINYINT,
>   documentary TINYINT,
>   drama TINYINT,
>   fantasy TINYINT,
>   film_noir TINYINT,
>   horror TINYINT,
>   musical TINYINT,
>   mystery TINYINT,
>   romance TINYINT,
>   sci_fi TINYINT,
>   thriller TINYINT,
>   war TINYINT,
>   western TINYINT)
>   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
>   STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
> select * from ml_items ORDER BY id ASC;
> While, the following create external table with AvroSerDe is not working:
> DROP TABLE IF EXISTS ml_items_as_avro;
> CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
> describe ml_items_as_avro;
> INSERT OVERWRITE TABLE ml_items_as_avro
>   SELECT id, title,
> imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
> crime,
> documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
> sci_fi, thriller, war, western
>   FROM ml_items;
> ml_items_as_avro is not created with expected schema, as shown in the 
> "describe ml_items_as_avro" output. The output is below:
> PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> PREHOOK: type: DROPTABLE
> POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
> POSTHOOK: type: DROPTABLE
> PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> PREHOOK: type: CREATETABLE
> POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   WITH SERDEPROPERTIES (
> 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
> POSTHOOK: type: CREATETABLE
> POSTHOOK: Output: default@ml_items_as_avro
> PREHOOK: query: describe ml_items_as_avro
> PREHOOK: type: DESCTABLE
> POSTHOOK: query: describe ml_items_as_avro
> POSTHOOK: type: DESCTABLE
> error_error_error_error_error_error_error   string  from deserializer
> cannot_determine_schema string  from deserializer
> check   string  from deserializer
> schema  string  from deserializer
> url string  from deserializer
> and string  from deserializer
> literal string  from deserializer
> FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target