[jira] [Created] (SPARK-26280) Spark will read entire CSV file even when limit is used

Amir Bar-Or (JIRA) Wed, 05 Dec 2018 07:41:41 -0800

Amir Bar-Or created SPARK-26280:
-----------------------------------

             Summary: Spark will read entire CSV file even when limit is used
                 Key: SPARK-26280
                 URL: https://issues.apache.org/jira/browse/SPARK-26280
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.3.1
            Reporter: Amir Bar-Or



When you read CSV as below , the parser still waste time and read the entire 
file:

var lineDF1 = spark.read
 .format("com.databricks.spark.csv")
 .option("header", "true") //reading the headers
 .option("mode", "DROPMALFORMED")
 .option("delimiter",",")
 .option("inferSchema", "false")
 .schema(line_schema)
 .load(i_lineitem)
 .lineDF1.limit(10)

 

Even though a  LocalLimit is created , this does not stop the FileScan and the 
parser from parsing entire file.   Is it possible to push the limit down and 
stop the parsing ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26280) Spark will read entire CSV file even when limit is used

Reply via email to