[ 
https://issues.apache.org/jira/browse/HIVE-12718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15859141#comment-15859141
 ] 

Charles Bernard commented on HIVE-12718:
----------------------------------------

We are experiencing the same issue running CDH 5.8.0.

Our problem is that the wrong line (not the last one) is being skipped. Forcing 
one mapper only does not help.

> skip.footer.line.count misbehaves on larger text files
> ------------------------------------------------------
>
>                 Key: HIVE-12718
>                 URL: https://issues.apache.org/jira/browse/HIVE-12718
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: The bug was discovered and reproduced on a Cloudera 
> Hadoop 5.4 distribution running on CentOS 6.4.
>            Reporter: Gergely Nagy
>            Priority: Minor
>
> We noticed that when working on a table backed by a larger (large enough to 
> require splitting) text file, the {{skip.footer.line.count}} property of the 
> table misbehaves: the footer is not being ignored.
> To reproduce, follow these steps:
> 1) Create a large file: {{for i in $(seq 1 100); do cat 
> /usr/share/dict/words; done >large.txt}}
> 2) Upload it to HDFS (eg, as {{/tmp/words}})
> 3) Create an external table with {{skip.footer.line.count}} set: 
> {quote}
> CREATE EXTERNAL TABLE ext_words (word STRING)
>   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
>   LINES TERMINATED BY '\n'
>   STORED AS TEXTFILE LOCATION '/tmp/words'
>   tblproperties("skip.header.line.count"="1", "skip.footer.line.count"="1");
> {quote}
> 4) Count the number of times the last line (in this example, I assume that to 
> be {{ZZZ}}) appears: {{SELECT COUNT( * ) FROM ext_words WHERE word = 'ZZZ';}}
> 5) Observe that it returns 100 instead of 99.
> Investigation showed that this happens when there are more than one mappers 
> used for the job. If we increase the split size, to force using one mapper 
> only, the problem did not occur.
> There may be other related issues as well, like the wrong line being skipped 
> -- but we did not reproduce those yet.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to