Re: Accessing Hive Tables in Spark

2018-04-12 Thread Liang-Chi Hsieh
Seems like Spark can't access hive-site.xml under cluster mode. One solution is to add the config `spark.yarn.dist.files=/path/to/hive-site.xml` to your spark-defaults.conf. And don't forget to call `enableHiveSupport()` on `SparkSession`. Tushar Singhal wrote > Hi Everyone, > > I was accessing

Live Stream Code Reviews :)

2018-04-12 Thread Holden Karau
Hi Y'all, If your interested in learning more about how the development process in Apache Spark works I've been doing a weekly live streamed code review most Fridays at 11am. This weeks will be on twitch/youtube ( https://www.twitch.tv/holdenkarau / https://www.youtube.com/watch?v=vGVSa9KnD80 ). I

Re: Live Stream Code Reviews :)

2018-04-12 Thread Holden Karau
Ah yes really good point 11am pacific :) On Thu, Apr 12, 2018 at 1:01 PM, Marco Mistroni wrote: > PST I believelike last time > Works out 9pm bst & 10 pm cet if I m correct > > On Thu, Apr 12, 2018, 8:47 PM Matteo Olivi wrote: > >> Hi, >> 11 am in which timezone? >> >> Il gio 12 apr 2018,

Sorting on a streaming dataframe

2018-04-12 Thread Hemant Bhanawat
Hi Guys, Why is sorting on streaming dataframes not supported(unless it is complete mode)? My downstream needs me to sort the streaming dataframe. Hemant

Re: Sorting on a streaming dataframe

2018-04-12 Thread Reynold Xin
Can you describe your use case more? On Thu, Apr 12, 2018 at 11:12 PM Hemant Bhanawat wrote: > Hi Guys, > > Why is sorting on streaming dataframes not supported(unless it is complete > mode)? My downstream needs me to sort the streaming dataframe. > > Hemant >

Re: Sorting on a streaming dataframe

2018-04-12 Thread Hemant Bhanawat
Well, we want to assign snapshot ids (incrementing counters) to the incoming records. For that, we are zipping the streaming rdds with that counter using a modified version of ZippedWithIndexRDD. We are ok if the records in the streaming dataframe gets counters in random order but the counter shoul