GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21377
[SPARK-24325] Tests for Hadoop's LineReader
## What changes were proposed in this pull request?
The tests cover basic functionality of [Hadoop
LinesReader](https://github.com/apache/spark/blob/8d79113b812a91073d2c24a3a9ad94cc3b90b24a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala#L42).
In particular, the added tests check:
- A split slices a line or delimiter
- A split slices two consecutive lines and cover a delimiter between the
lines
- Two splits slice a line and there are no duplicates
- Internal buffer size (`io.file.buffer.size`) is less than line length
- Constrain of maximum line length -
`mapreduce.input.linerecordreader.line.maxlength`
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 line-reader-tests
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21377.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21377
----
commit 31ac9ca4e992f234df63d09c2934919f24fe20d4
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-05-20T16:55:04Z
Tests for Hadoop's LineReader
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]