Hi Shraddha, what is interesting to me that people do not even have the courtesy to write their name when they request for help to user groups :)
your solution is spot on, there is another option available in spark SQL though for this. Regards, Gourav Sengupta On Thu, Jan 9, 2020 at 1:19 PM Shraddha Shah <shah.shraddha...@gmail.com> wrote: > Unless I am reading this wrong, this can be achieved with aws sync ? > > aws s3 sync > s3://my-bucket/ingestion/source1/y=2019/m=12/d=12 > s3://my-bucket/ingestion/processed/ > *src_category=other*/y=2019/m=12/d=12 > > Thanks, > -Shraddha > > > > On Thu, Jan 9, 2020 at 7:05 AM Gourav Sengupta <gourav.sengu...@gmail.com> > wrote: > >> why s3a? >> >> On Thu, Jan 9, 2020 at 2:20 AM anbutech <anbutec...@outlook.com> wrote: >> >>> Hello, >>> >>> version = spark 2.4.3 >>> >>> I have 3 different sources json logs data which having same schema(same >>> columns order) in the raw data and want to add one new column as >>> "src_category" for all the 3 different source to distinguish the >>> source >>> category and merge all the 3 different sources into the single >>> dataframe >>> to read the json data for the processing.what is the best way to handle >>> this case. >>> >>> df = spark.read.json(merged_3sourcesraw_data) >>> >>> Input: >>> >>> s3a://my-bucket/ingestion/source1/y=2019/m=12/d=12/logs1.json >>> s3a://my-bucket/ingestion/source2/y=2019/m=12/d=12/logs1.json >>> s3a://my-bucket/ingestion/source3/y=2019/m=12/d=12/logs1.json >>> >>> output: >>> s3a://my-bucket/ingestion/processed/y=2019/m=12/d=12/src_category=other >>> >>> s3a://my-bucket/ingestion/processed/y=2019/m=12/d=12/src_category=windows-new >>> s3a://my-bucket/ingestion/processed/y=2019/m=12/d=12/src_category=windows >>> >>> >>> Thanks >>> >>> >>> >>> >>> -- >>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> >>>