Joining two HDFS files in in Spark

Chhaya Vishwakarma Wed, 19 Mar 2014 01:58:30 -0700

Hi

I want to join two files from HDFS using spark shell.
Both the files are tab separated and I want to join on second column


Tried code
But not giving any output

val ny_daily= 
sc.parallelize(List("hdfs://localhost:8020/user/user/NYstock/NYSE_daily"))

val ny_daily_split = ny_daily.map(line =>line.split('\t'))

val enKeyValuePair = ny_daily_split.map(line => (line(0).substring(0, 5), 
line(3).toInt))


val ny_dividend= 
sc.parallelize(List("hdfs://localhost:8020/user/user/NYstock/NYSE_dividends"))

val ny_dividend_split = ny_dividend.map(line =>line.split('\t'))

val enKeyValuePair1 = ny_dividend_split.map(line => (line(0).substring(0, 4), 
line(3).toInt))

enKeyValuePair1.join(enKeyValuePair)


But I am not getting any information for how to join files on particular column
Please suggest



Regards,
Chhaya Vishwakarma


________________________________
The contents of this e-mail and any attachment(s) may contain confidential or 
privileged information for the intended recipient(s). Unintended recipients are 
prohibited from taking action on the basis of information in this e-mail and 
using or disseminating the information, and must notify the sender and delete 
it from their system. L&T Infotech will not accept responsibility or liability 
for the accuracy or completeness of, or the presence of any virus or disabling 
code in this e-mail"

Joining two HDFS files in in Spark

Reply via email to