Hi
I want to join two files from HDFS using spark shell.
Both the files are tab separated and I want to join on second column
Tried code
But not giving any output
val ny_daily=
sc.parallelize(List("hdfs://localhost:8020/user/user/NYstock/NYSE_daily"))
val ny_daily_split = ny_daily.map(line =>line.split('\t'))
val enKeyValuePair = ny_daily_split.map(line => (line(0).substring(0, 5),
line(3).toInt))
val ny_dividend=
sc.parallelize(List("hdfs://localhost:8020/user/user/NYstock/NYSE_dividends"))
val ny_dividend_split = ny_dividend.map(line =>line.split('\t'))
val enKeyValuePair1 = ny_dividend_split.map(line => (line(0).substring(0, 4),
line(3).toInt))
enKeyValuePair1.join(enKeyValuePair)
But I am not getting any information for how to join files on particular column
Please suggest
Regards,
Chhaya Vishwakarma
________________________________
The contents of this e-mail and any attachment(s) may contain confidential or
privileged information for the intended recipient(s). Unintended recipients are
prohibited from taking action on the basis of information in this e-mail and
using or disseminating the information, and must notify the sender and delete
it from their system. L&T Infotech will not accept responsibility or liability
for the accuracy or completeness of, or the presence of any virus or disabling
code in this e-mail"