Dear all,
I am trying to connect a remote windows machine to a standalone spark
cluster (a single VM running on Ubuntu server with 8 cores and 64GB RAM).
Both client and server have Spark 2.0 software prebuilt for Hadoop 2.6, and
hadoop 2.7
I have the following settings on cluster:
export
Dear list,
I am trying to calculate sum and count on the same column:
user_id_books_clicks =
(sqlContext.read.parquet('hdfs:///projects/kaggle-expedia/input/train.parquet')
.groupby('user_id')
.agg({'is_booking':'count',
Dear all,
The page
https://spark.apache.org/community.html
Says : If you'd like your meetup added, email user@spark.apache.org.
So here I am emailing, could please someone add three new groups to the page
Moscow : http://www.meetup.com/Apache-Spark-in-Moscow/
Slovenija (Ljubljana)
Moscow : http://www.meetup.com/Apache-Spark-in-Moscow/
Slovenija (Ljubljana) http://www.meetup.com/Apache-Spark-Ljubljana-Meetup/