[ 
https://issues.apache.org/jira/browse/SPARK-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13960908#comment-13960908
 ] 

Xusen Yin commented on SPARK-1415:
----------------------------------

Hi Matei, I just looked around in those Hadoop APIs. I find that the new Hadoop 
API deprecates the minSplit, instead of minSplit, they prefer minSplitSize and 
maxSplitSize to control the split. minSplit is negative correlated with 
maxSplitSize, so I think we have 2 ways to fix the issue:

1. We just provide a new API with maxSplitSize, say, wholeTextFiles(path: 
String, maxSplitSize: Long);

2. We write a delegation to compute the maxSplitSize using minSplit (easy to 
write, taking old Hadoop API as an example), and provide the API 
wholeTextFile(path: String, minSplit: Int);

I also think we can provide the two APIs simultaneously. What do you think?

> Add a minSplits parameter to wholeTextFiles
> -------------------------------------------
>
>                 Key: SPARK-1415
>                 URL: https://issues.apache.org/jira/browse/SPARK-1415
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Matei Zaharia
>            Assignee: Xusen Yin
>              Labels: Starter
>
> This probably requires adding one to newAPIHadoopFile too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to