subject:"\"FileStreamSource source checks path eagerly\\\?\""

Re: FileStreamSource source checks path eagerly?

2016-09-08 Thread Matei Zaharia

This source is meant to be used for a shared file system such as HDFS or NFS, where both the driver and the workers can see the same folders. There's no support in Spark for just working with local files on different workers. Matei > On Sep 8, 2016, at 2:23 AM, Jacek Laskowski wrote: > > Hi S

Re: FileStreamSource source checks path eagerly?

2016-09-08 Thread Jacek Laskowski

Hi Steve, Thank you for more source-oriented answer. Helped but didn't explain the reason for such eagerness. The file(s) might not be on the driver but on executors only where the Spark job(s) run. I don't see why Spark should check the file(s) regardless of glob pattern being used. You see my w

Re: FileStreamSource source checks path eagerly?

2016-09-08 Thread Steve Loughran

failfast generally means that you find problems sooner rather than later, and here, potentially, that your code runs but simply returns empty data without any obvious cue as to what is wrong. As is always good in OSS, follow those stack trace links to see what they say: // Check whether

Re: FileStreamSource source checks path eagerly?

2016-09-08 Thread Jacek Laskowski

On Thu, Sep 8, 2016 at 9:03 AM, Fred Reiss wrote: > I suppose the type-inference-time check for the presence of the input > directory could be moved to the FileStreamSource's initialization. But if > the directory isn't there when the source is being created, it probably > won't be there when the

Re: FileStreamSource source checks path eagerly?

2016-09-08 Thread Fred Reiss

The input directory does need to be visible from the driver process, since FileStreamSource does its polling from the driver. FileStreamSource creates a Dataset for each microbatch. I suppose the type-inference-time check for the presence of the input directory could be moved to the FileStreamSour

FileStreamSource source checks path eagerly?

2016-09-07 Thread Jacek Laskowski

Hi, I'm wondering what's the rationale for checking the path option eagerly in FileStreamSource? My thinking is that until start is called there's no processing going on that is supposed to happen on executors (not the driver) with the path available. I could (and perhaps should) use dfs but IMHO

Re: FileStreamSource source checks path eagerly?

Re: FileStreamSource source checks path eagerly?

Re: FileStreamSource source checks path eagerly?

Re: FileStreamSource source checks path eagerly?

Re: FileStreamSource source checks path eagerly?

FileStreamSource source checks path eagerly?

6 matches

Site Navigation

Mail list logo

Footer information