Re: relative path in DataFrameWriter and DataStreamWriter

2025-01-28 Thread Rozov, Vlad
Please review https://github.com/apache/spark/pull/49654 > Your best bet is to make relative path in driver to be resolved to absolute > path and pass over to executor with that resolved path. Right, this is exactly what I was going to implement and how it is done for DataFrameWriter in the Dat

Re: relative path in DataFrameWriter and DataStreamWriter

2025-01-16 Thread Jungtaek Lim
Your best bet is to make relative path in driver to be resolved to absolute path and pass over to executor with that resolved path. This needs some discussion whether we want to do that, but this is at least technically correct. On Fri, Jan 17, 2025 at 1:54 PM Jungtaek Lim wrote: > Examples are

Re: relative path in DataFrameWriter and DataStreamWriter

2025-01-16 Thread Jungtaek Lim
Examples are assuming you are running them in the single node cluster. If you feel like it's causing confusion, this is something we need to fix, e.g. put disclaimer that the example is based on the assumption it is running with a single node cluster. >> > More problematic thing is to use the loca

Re: relative path in DataFrameWriter and DataStreamWriter

2025-01-16 Thread Rozov, Vlad
> More problematic thing is to use the local filesystem for the path which is > interpreted by distributed machines. It depends. Nowadays distributed systems mostly use cloud (S3, GFS, etc) or HDFS, but NFS and other locally mounted FS can still be in use and should be supported. > this actua

Re: relative path in DataFrameWriter and DataStreamWriter

2025-01-15 Thread Jungtaek Lim
> I do understand that using relative path is not the best option especially in the distributed systems More problematic thing is to use the local filesystem for the path which is interpreted by distributed machines. Yes, using relative paths is also problematic since it depends on the working dir

Re: relative path in DataFrameWriter and DataStreamWriter

2025-01-15 Thread Rozov, Vlad
Resending... > On Jan 9, 2025, at 1:57 PM, Rozov, Vlad wrote: > > Hi, > > I see a difference in how “path" is handled in DataFrameWriter.save(path) and > DataStreamWriter.start(path) while using relative path (for example > “test.parquet") to write to parquet files (possibly applies to other

relative path in DataFrameWriter and DataStreamWriter

2025-01-09 Thread Rozov, Vlad
Hi, I see a difference in how “path" is handled in DataFrameWriter.save(path) and DataStreamWriter.start(path) while using relative path (for example “test.parquet") to write to parquet files (possibly applies to other file formats as well). In case of DataFrameWriter path is relative to the cu