[ https://issues.apache.org/jira/browse/HIVE-12718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15859141#comment-15859141 ]
Charles Bernard commented on HIVE-12718: ---------------------------------------- We are experiencing the same issue running CDH 5.8.0. Our problem is that the wrong line (not the last one) is being skipped. Forcing one mapper only does not help. > skip.footer.line.count misbehaves on larger text files > ------------------------------------------------------ > > Key: HIVE-12718 > URL: https://issues.apache.org/jira/browse/HIVE-12718 > Project: Hive > Issue Type: Bug > Affects Versions: 1.1.0 > Environment: The bug was discovered and reproduced on a Cloudera > Hadoop 5.4 distribution running on CentOS 6.4. > Reporter: Gergely Nagy > Priority: Minor > > We noticed that when working on a table backed by a larger (large enough to > require splitting) text file, the {{skip.footer.line.count}} property of the > table misbehaves: the footer is not being ignored. > To reproduce, follow these steps: > 1) Create a large file: {{for i in $(seq 1 100); do cat > /usr/share/dict/words; done >large.txt}} > 2) Upload it to HDFS (eg, as {{/tmp/words}}) > 3) Create an external table with {{skip.footer.line.count}} set: > {quote} > CREATE EXTERNAL TABLE ext_words (word STRING) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' > LINES TERMINATED BY '\n' > STORED AS TEXTFILE LOCATION '/tmp/words' > tblproperties("skip.header.line.count"="1", "skip.footer.line.count"="1"); > {quote} > 4) Count the number of times the last line (in this example, I assume that to > be {{ZZZ}}) appears: {{SELECT COUNT( * ) FROM ext_words WHERE word = 'ZZZ';}} > 5) Observe that it returns 100 instead of 99. > Investigation showed that this happens when there are more than one mappers > used for the job. If we increase the split size, to force using one mapper > only, the problem did not occur. > There may be other related issues as well, like the wrong line being skipped > -- but we did not reproduce those yet. -- This message was sent by Atlassian JIRA (v6.3.15#6346)