I think this could be the reason :
DataFrame sorts the column of each record lexicographically if we do a *select
**. So, if we wish to maintain a specific column ordering while processing
we should use do *select col1, col2...* instead of select *.
However, this is just what I feel. Let's wait f
Cool. Here is it how it goes...
I am reading Avro objects from a Kafka topic as a DStream, converting it
into a DataFrame so that I can filter out records based on some conditions
and finally do some aggregations on these filtered records. During the
process I also need to tag each record based on
Hi Tariq,
Can you tell in brief what kind of operation you have to do? I can try
helping you out with that.
In general, if you are trying to use any group operations you can use
window operations.
On Wed, Mar 2, 2016 at 6:40 PM, Mohammad Tariq wrote:
> Hi Sainath,
>
> Thank you for the prompt r
Hi Sainath,
Thank you for the prompt response!
Could you please elaborate your answer a bit? I'm sorry I didn't quite get
this. What kind of operation I can perform using SQLContext? It just helps
us during things like DF creation, schema application etc, IMHO.
[image: http://]
Tariq, Mohamma
Instead of collecting the data frame, you can try using a sqlContext on the
data frame. But it depends on what kind of operations are you trying to
perform.
On Wed, Mar 2, 2016 at 6:21 PM, Mohammad Tariq wrote:
> Hi list,
>
> *Scenario :*
> I am creating a DStream by reading an Avro object from
Hi list,
*Scenario :*
I am creating a DStream by reading an Avro object from a Kafka topic and
then converting it into a DataFrame to perform some operations on the data.
I call DataFrame.collect() and perform the intended operation on each Row
of Array[Row] returned by DataFrame.collect().
*Prob