[jira] [Updated] (HIVE-18834) Lzo files not getting split in hive jobs on hive2.1.0、hive2.2.0

2018-03-01 Thread Saijin Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saijin Huang updated HIVE-18834:

Description: 
According to the guide from wiki,i want to select count(*) from lzotest;

environment:

hive2.2.0,hive2.2.0,hadoop 2.7.3

hive on mr ,hive on spark

We have 1 .lzo files and the corresponding .index files .We are creating a 
external table lzotest on the directory of these files. Size of the  lzo file 
is 2.2G. When we run hive queries , unfortunately only 2 mappers are being 
spawned,one map is correspond  to the lzo file,the other map iscorresponding to 
the index file.Obviously,lzo files not getting split and the number of count(*) 
is incorrect and greater than the true value.

hive on mr and hive on spark get incorrct value.

However,hive on tez can get the corrct value.The mr job can get 27 maps and lzo 
files can be success to split.

 

--

We build a new environment based on hive-1.1.0-cdh5.4.2, everything works fine.

--

We have checked the configuration .There are nothing problem.

Then we confuse that hive on tez works fine while hive on mr,hive on spark can 
not get the correct value.

The splitting is not happening, what could be the issue?Any suggestions for a 
quick workaround t?

  was:
According to the guide from wiki,i want to select count(*) from lzotest;

<


> Lzo files not getting split in hive jobs on hive2.1.0、hive2.2.0
> ---
>
> Key: HIVE-18834
> URL: https://issues.apache.org/jira/browse/HIVE-18834
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Saijin Huang
>Assignee: Saijin Huang
>Priority: Major
>
> According to the guide from wiki,i want to select count(*) from lzotest;
> environment:
> hive2.2.0,hive2.2.0,hadoop 2.7.3
> hive on mr ,hive on spark
> We have 1 .lzo files and the corresponding .index files .We are creating a 
> external table lzotest on the directory of these files. Size of the  lzo file 
> is 2.2G. When we run hive queries , unfortunately only 2 mappers are being 
> spawned,one map is correspond  to the lzo file,the other map iscorresponding 
> to the index file.Obviously,lzo files not getting split and the number of 
> count(*) is incorrect and greater than the true value.
> hive on mr and hive on spark get incorrct value.
> However,hive on tez can get the corrct value.The mr job can get 27 maps and 
> lzo files can be success to split.
>  
> --
> We build a new environment based on hive-1.1.0-cdh5.4.2, everything works 
> fine.
> --
> We have checked the configuration .There are nothing problem.
> Then we confuse that hive on tez works fine while hive on mr,hive on spark 
> can not get the correct value.
> The splitting is not happening, what could be the issue?Any suggestions for a 
> quick workaround t?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18834) Lzo files not getting split in hive jobs on hive2.1.0、hive2.2.0

2018-03-01 Thread Saijin Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saijin Huang updated HIVE-18834:

Description: 
According to the guide from wiki,i want to select count(*) from lzotest;

<

> Lzo files not getting split in hive jobs on hive2.1.0、hive2.2.0
> ---
>
> Key: HIVE-18834
> URL: https://issues.apache.org/jira/browse/HIVE-18834
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Saijin Huang
>Assignee: Saijin Huang
>Priority: Major
>
> According to the guide from wiki,i want to select count(*) from lzotest;
> <



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)