[jira] [Updated] (HIVE-19580) Hive 2.3.2 with ORC files & stored on S3 are case sensitive on EMR

2019-02-20 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HIVE-19580:
--
Summary: Hive 2.3.2 with ORC files & stored on S3 are case sensitive on EMR 
 (was: Hive 2.3.2 with ORC files stored on S3 are case sensitive)

> Hive 2.3.2 with ORC files & stored on S3 are case sensitive on EMR
> --
>
> Key: HIVE-19580
> URL: https://issues.apache.org/jira/browse/HIVE-19580
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.2
> Environment: EMR s3:// connector
> Spark 2.3 but also true for lower versions
> Hive 2.3.2
>Reporter: Arthur Baudry
>Priority: Major
> Fix For: 2.3.2
>
>
> Original file is csv:
> COL1,COL2
>  1,2
> ORC file are created with Spark 2.3:
> scala> val df = spark.read.option("header","true").csv("/user/hadoop/file")
> scala> df.printSchema
>  root
> |– COL1: string (nullable = true)|
> |– COL2: string (nullable = true)|
> scala> df.write.orc("s3://bucket/prefix")
> In Hive:
> hive> CREATE EXTERNAL TABLE test_orc(COL1 STRING, COL2 STRING) STORED AS ORC 
> LOCATION ("s3://bucket/prefix");
> hive> SELECT * FROM test_orc;
>  OK
>  NULL NULL
> *Everyfield is null. However if fields are generated using lower case in 
> Spark schemas then everything works.*
> The reason why I'm raising this bug is that we have customers using Hive 
> 2.3.2 to read files we generate through Spark and all our code base is 
> addressing fields using upper case while this is incompatible with their Hive 
> instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19580) Hive 2.3.2 with ORC files stored on S3 are case sensitive

2019-02-19 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-19580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HIVE-19580:
--
Environment: 
EMR s3:// connector

Spark 2.3 but also true for lower versions

Hive 2.3.2

  was:
AWS S3 to store files

Spark 2.3 but also true for lower versions

Hive 2.3.2


> Hive 2.3.2 with ORC files stored on S3 are case sensitive
> -
>
> Key: HIVE-19580
> URL: https://issues.apache.org/jira/browse/HIVE-19580
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.2
> Environment: EMR s3:// connector
> Spark 2.3 but also true for lower versions
> Hive 2.3.2
>Reporter: Arthur Baudry
>Priority: Major
> Fix For: 2.3.2
>
>
> Original file is csv:
> COL1,COL2
>  1,2
> ORC file are created with Spark 2.3:
> scala> val df = spark.read.option("header","true").csv("/user/hadoop/file")
> scala> df.printSchema
>  root
> |– COL1: string (nullable = true)|
> |– COL2: string (nullable = true)|
> scala> df.write.orc("s3://bucket/prefix")
> In Hive:
> hive> CREATE EXTERNAL TABLE test_orc(COL1 STRING, COL2 STRING) STORED AS ORC 
> LOCATION ("s3://bucket/prefix");
> hive> SELECT * FROM test_orc;
>  OK
>  NULL NULL
> *Everyfield is null. However if fields are generated using lower case in 
> Spark schemas then everything works.*
> The reason why I'm raising this bug is that we have customers using Hive 
> 2.3.2 to read files we generate through Spark and all our code base is 
> addressing fields using upper case while this is incompatible with their Hive 
> instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19580) Hive 2.3.2 with ORC files stored on S3 are case sensitive

2018-05-17 Thread Arthur Baudry (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arthur Baudry updated HIVE-19580:
-
Description: 
Original file is csv:

COL1,COL2
 1,2

ORC file are created with Spark 2.3:

scala> val df = spark.read.option("header","true").csv("/user/hadoop/file")

scala> df.printSchema
 root
|– COL1: string (nullable = true)|
|– COL2: string (nullable = true)|

scala> df.write.orc("s3://bucket/prefix")

In Hive:

hive> CREATE EXTERNAL TABLE test_orc(COL1 STRING, COL2 STRING) STORED AS ORC 
LOCATION ("s3://bucket/prefix");

hive> SELECT * FROM test_orc;
 OK
 NULL NULL

*Everyfield is null. However if fields are generated using lower case in Spark 
schemas then everything works.*

The reason why I'm raising this bug is that we have customers using Hive 2.3.2 
to read files we generate through Spark and all our code base is addressing 
fields using upper case while this is incompatible with their Hive instance.

  was:
Original file is csv:

COL1,COL2
1,2

ORC file are created with Spark 2.3:

scala> val df = spark.read.option("header","true").csv("/user/hadoop/file")

scala> df.printSchema
root
|-- COL1: string (nullable = true)
|-- COL2: string (nullable = true)

scala> df.write.orc("s3://bucket/prefix")

In Hive:

hive> CREATE EXTERNAL TABLE test_orc(COL1 STRING, COL2 STRING) STORED AS ORC 
LOCATION ("s3://bucket/prefix");

hive> SELECT * FROM test_orc;
OK
NULL NULL

*Everything field is null. However if fields are generated using lower case in 
Spark schemas then everything works.*

The reason why I'm raising this bug is that we have customers using Hive 2.3.2 
to read files we generate through Spark and all our code base is addressing 
fields using upper case while this is incompatible with their Hive instance.


> Hive 2.3.2 with ORC files stored on S3 are case sensitive
> -
>
> Key: HIVE-19580
> URL: https://issues.apache.org/jira/browse/HIVE-19580
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.2
> Environment: AWS S3 to store files
> Spark 2.3 but also true for lower versions
> Hive 2.3.2
>Reporter: Arthur Baudry
>Priority: Major
> Fix For: 2.3.2
>
>
> Original file is csv:
> COL1,COL2
>  1,2
> ORC file are created with Spark 2.3:
> scala> val df = spark.read.option("header","true").csv("/user/hadoop/file")
> scala> df.printSchema
>  root
> |– COL1: string (nullable = true)|
> |– COL2: string (nullable = true)|
> scala> df.write.orc("s3://bucket/prefix")
> In Hive:
> hive> CREATE EXTERNAL TABLE test_orc(COL1 STRING, COL2 STRING) STORED AS ORC 
> LOCATION ("s3://bucket/prefix");
> hive> SELECT * FROM test_orc;
>  OK
>  NULL NULL
> *Everyfield is null. However if fields are generated using lower case in 
> Spark schemas then everything works.*
> The reason why I'm raising this bug is that we have customers using Hive 
> 2.3.2 to read files we generate through Spark and all our code base is 
> addressing fields using upper case while this is incompatible with their Hive 
> instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)