Hi Tariq, Can you tell in brief what kind of operation you have to do? I can try helping you out with that. In general, if you are trying to use any group operations you can use window operations.
On Wed, Mar 2, 2016 at 6:40 PM, Mohammad Tariq <donta...@gmail.com> wrote: > Hi Sainath, > > Thank you for the prompt response! > > Could you please elaborate your answer a bit? I'm sorry I didn't quite get > this. What kind of operation I can perform using SQLContext? It just helps > us during things like DF creation, schema application etc, IMHO. > > > > [image: http://] > > Tariq, Mohammad > about.me/mti > [image: http://] > <http://about.me/mti> > > > On Thu, Mar 3, 2016 at 4:59 AM, Sainath Palla <pallasain...@gmail.com> > wrote: > >> Instead of collecting the data frame, you can try using a sqlContext on >> the data frame. But it depends on what kind of operations are you trying to >> perform. >> >> On Wed, Mar 2, 2016 at 6:21 PM, Mohammad Tariq <donta...@gmail.com> >> wrote: >> >>> Hi list, >>> >>> *Scenario :* >>> I am creating a DStream by reading an Avro object from a Kafka topic and >>> then converting it into a DataFrame to perform some operations on the data. >>> I call DataFrame.collect() and perform the intended operation on each Row >>> of Array[Row] returned by DataFrame.collect(). >>> >>> *Problem : * >>> Calling DataFrame.collect() changes the schema of the underlying record, >>> thus making it impossible to get the columns by index(as the order gets >>> changed). >>> >>> *Query :* >>> Is it the way DataFrame.collect() behaves or am I doing something wrong >>> here? In former case is there any way I can maintain the schema while >>> getting each Row? >>> >>> Any pointers/suggestions would be really helpful. Many thanks! >>> >>> >>> [image: http://] >>> >>> Tariq, Mohammad >>> about.me/mti >>> [image: http://] >>> <http://about.me/mti> >>> >>> >> >> >