well I do not really need to do it while another job is editing them. I just need to get the names of the folders when I read through textFile("path/to/dir/*/*/*.js")
Using *native hadoop* libraries, can I do something like* fs.copy("/my/path/*/*","new/path/")?* Narek Galstyan Նարեկ Գալստյան On 27 October 2015 at 19:13, Deenar Toraskar <deenar.toras...@gmail.com> wrote: > This won't work as you can never guarantee which files were read by Spark > if some other process is writing files to the same location. It would be > far less work to move files matching your pattern to a staging location and > then load them using sc.textFile. you should find hdfs file system calls > that are equivalent to normal file system if command line tools like distcp > or mv don't meet your needs. > On 27 Oct 2015 1:49 p.m., "Նարեկ Գալստեան" <ngalsty...@gmail.com> wrote: > >> Dear Spark users, >> >> I am reading a set of json files to compile them to Parquet data format. >> I am willing to mark the folders in some way after having read their >> contents so that I do not read it again(e.g. I can changed the name of the >> folder). >> >> I use .textFile("path/to*/dir/*/*/*.js") *technique to* automatically >> *detect >> the files. >> I cannot however, use the same notation* to rename them.* >> >> Could you suggest how I can *get the names of these folders* so that I can >> rename them using native hadoop libraries. >> >> I am using Apache Spark 1.4.1 >> >> I look forward to hearing suggestions!! >> >> yours, >> >> Narek >> >> Նարեկ Գալստյան >> >