Can't connect to remote spark standalone cluster: getting WARN TaskSchedulerImpl: Initial job has not accepted any resources

2016-08-16 Thread Andrew Vykhodtsev
Dear all, I am trying to connect a remote windows machine to a standalone spark cluster (a single VM running on Ubuntu server with 8 cores and 64GB RAM). Both client and server have Spark 2.0 software prebuilt for Hadoop 2.6, and hadoop 2.7 I have the following settings on cluster: export

pyspark.GroupedData.agg works incorrectly when one column is aggregated twice?

2016-05-27 Thread Andrew Vykhodtsev
Dear list, I am trying to calculate sum and count on the same column: user_id_books_clicks = (sqlContext.read.parquet('hdfs:///projects/kaggle-expedia/input/train.parquet') .groupby('user_id') .agg({'is_booking':'count',

Adding meetup groups to Community page - Moscow, Slovenia, Zagreb

2015-07-17 Thread Andrew Vykhodtsev
Dear all, The page https://spark.apache.org/community.html Says : If you'd like your meetup added, email user@spark.apache.org. So here I am emailing, could please someone add three new groups to the page Moscow : http://www.meetup.com/Apache-Spark-in-Moscow/ Slovenija (Ljubljana)

Please add two groups to Community page

2015-07-16 Thread Andrew Vykhodtsev
Moscow : http://www.meetup.com/Apache-Spark-in-Moscow/ Slovenija (Ljubljana) http://www.meetup.com/Apache-Spark-Ljubljana-Meetup/