Re: Pyspark SQL Join Failure

2015-12-20 Thread Chris Fregly
how does Spark SQL/DataFrame know that train_users_2.csv has a field named, "id" or anything else domain specific? is there a header? if so, does sc.textFile() know about this header? I'd suggest using the Databricks spark-csv package for reading csv data. there is an option in there to

Pyspark SQL Join Failure

2015-12-19 Thread Weiwei Zhang
Hi all, I got this error when I tried to use the 'join' function to left outer join two data frames in pyspark 1.4.1. Please kindly point out the places where I made mistakes. Thank you. Traceback (most recent call last): File "/Users/wz/PycharmProjects/PysparkTraining/Airbnb/src/driver.py",