Re: Finding unique across all columns in dataset

2016-09-19 Thread Mich Talebzadeh
something like this

df.filter('transactiontype > " ").filter(not('transactiontype ==="DEB") &&
not('transactiontype ==="BGC")).select('transactiontype).*distinct*
.collect.foreach(println)

HTH





Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 19 September 2016 at 14:12, ayan guha  wrote:

> Hi
>
> If you want column wise distinct, you may need to define it. Will it be
> possible to demonstrate your problem with an example? Like what's the input
> and output. Maybe with few columns..
> On 19 Sep 2016 20:36, "Abhishek Anand"  wrote:
>
>> Hi Ayan,
>>
>> How will I get column wise distinct items using this approach ?
>>
>> On Mon, Sep 19, 2016 at 3:31 PM, ayan guha  wrote:
>>
>>> Create an array out of cilumns, convert to Dataframe,
>>> explode,distinct,write.
>>> On 19 Sep 2016 19:11, "Saurav Sinha"  wrote:
>>>
 You can use distinct over you data frame or rdd

 rdd.distinct

 It will give you distinct across your row.

 On Mon, Sep 19, 2016 at 2:35 PM, Abhishek Anand <
 abhis.anan...@gmail.com> wrote:

> I have an rdd which contains 14 different columns. I need to find the
> distinct across all the columns of rdd and write it to hdfs.
>
> How can I acheive this ?
>
> Is there any distributed data structure that I can use and keep on
> updating it as I traverse the new rows ?
>
> Regards,
> Abhi
>



 --
 Thanks and Regards,

 Saurav Sinha

 Contact: 9742879062

>>>
>>


Re: Finding unique across all columns in dataset

2016-09-19 Thread ayan guha
Hi

If you want column wise distinct, you may need to define it. Will it be
possible to demonstrate your problem with an example? Like what's the input
and output. Maybe with few columns..
On 19 Sep 2016 20:36, "Abhishek Anand"  wrote:

> Hi Ayan,
>
> How will I get column wise distinct items using this approach ?
>
> On Mon, Sep 19, 2016 at 3:31 PM, ayan guha  wrote:
>
>> Create an array out of cilumns, convert to Dataframe,
>> explode,distinct,write.
>> On 19 Sep 2016 19:11, "Saurav Sinha"  wrote:
>>
>>> You can use distinct over you data frame or rdd
>>>
>>> rdd.distinct
>>>
>>> It will give you distinct across your row.
>>>
>>> On Mon, Sep 19, 2016 at 2:35 PM, Abhishek Anand >> > wrote:
>>>
 I have an rdd which contains 14 different columns. I need to find the
 distinct across all the columns of rdd and write it to hdfs.

 How can I acheive this ?

 Is there any distributed data structure that I can use and keep on
 updating it as I traverse the new rows ?

 Regards,
 Abhi

>>>
>>>
>>>
>>> --
>>> Thanks and Regards,
>>>
>>> Saurav Sinha
>>>
>>> Contact: 9742879062
>>>
>>
>


Re: Finding unique across all columns in dataset

2016-09-19 Thread Abhishek Anand
Hi Ayan,

How will I get column wise distinct items using this approach ?

On Mon, Sep 19, 2016 at 3:31 PM, ayan guha  wrote:

> Create an array out of cilumns, convert to Dataframe,
> explode,distinct,write.
> On 19 Sep 2016 19:11, "Saurav Sinha"  wrote:
>
>> You can use distinct over you data frame or rdd
>>
>> rdd.distinct
>>
>> It will give you distinct across your row.
>>
>> On Mon, Sep 19, 2016 at 2:35 PM, Abhishek Anand 
>> wrote:
>>
>>> I have an rdd which contains 14 different columns. I need to find the
>>> distinct across all the columns of rdd and write it to hdfs.
>>>
>>> How can I acheive this ?
>>>
>>> Is there any distributed data structure that I can use and keep on
>>> updating it as I traverse the new rows ?
>>>
>>> Regards,
>>> Abhi
>>>
>>
>>
>>
>> --
>> Thanks and Regards,
>>
>> Saurav Sinha
>>
>> Contact: 9742879062
>>
>


Re: Finding unique across all columns in dataset

2016-09-19 Thread ayan guha
Create an array out of cilumns, convert to Dataframe,
explode,distinct,write.
On 19 Sep 2016 19:11, "Saurav Sinha"  wrote:

> You can use distinct over you data frame or rdd
>
> rdd.distinct
>
> It will give you distinct across your row.
>
> On Mon, Sep 19, 2016 at 2:35 PM, Abhishek Anand 
> wrote:
>
>> I have an rdd which contains 14 different columns. I need to find the
>> distinct across all the columns of rdd and write it to hdfs.
>>
>> How can I acheive this ?
>>
>> Is there any distributed data structure that I can use and keep on
>> updating it as I traverse the new rows ?
>>
>> Regards,
>> Abhi
>>
>
>
>
> --
> Thanks and Regards,
>
> Saurav Sinha
>
> Contact: 9742879062
>


Re: Finding unique across all columns in dataset

2016-09-19 Thread Saurav Sinha
You can use distinct over you data frame or rdd

rdd.distinct

It will give you distinct across your row.

On Mon, Sep 19, 2016 at 2:35 PM, Abhishek Anand 
wrote:

> I have an rdd which contains 14 different columns. I need to find the
> distinct across all the columns of rdd and write it to hdfs.
>
> How can I acheive this ?
>
> Is there any distributed data structure that I can use and keep on
> updating it as I traverse the new rows ?
>
> Regards,
> Abhi
>



-- 
Thanks and Regards,

Saurav Sinha

Contact: 9742879062


Finding unique across all columns in dataset

2016-09-19 Thread Abhishek Anand
I have an rdd which contains 14 different columns. I need to find the
distinct across all the columns of rdd and write it to hdfs.

How can I acheive this ?

Is there any distributed data structure that I can use and keep on updating
it as I traverse the new rows ?

Regards,
Abhi