Ayan, Thanks for the help. In my scenario, currently, I have business rule
i.e. Animal Types in a file(later in a hive table). I want to go after only
those elements from the list. Once I identify the distinct counts, I have
to write two different functionalities if count(distinct(element))<=10
and
Hi
You can "generate" a sql through program. Python Example:
>>> schema
['id', 'Mammals', 'Birds', 'Fish', 'Reptiles', 'Amphibians']
>>>
>>> count_stmt=[ "count(distinct ) as ".replace("",x) for x
in schema]
>>> count_stmt
['count(distinct id) as id', 'count(distinct Mammals) as Mammals',
'count
+ user@spark.apache.org
Hi Daniel, I will try this one out and let you know. Thank you.
On Wed, Oct 5, 2016 at 9:50 AM, Daniel Siegmann <
dsiegm...@securityscorecard.io> wrote:
> I think it's fine to read animal types locally because there are only 70
> of them. It's just that you want to execut
Hi Ayan,
My Schema for DF2 is fixed but it has around 420 columns (70 Animal type
columns and 350 other columns).
Thanks,
Ajay
On Wed, Oct 5, 2016 at 10:37 AM, ayan guha wrote:
> Is your schema for df2 is fixed? ie do you have 70 category columns?
>
> On Thu, Oct 6, 2016 at 12:50 AM, Daniel Si
Is your schema for df2 is fixed? ie do you have 70 category columns?
On Thu, Oct 6, 2016 at 12:50 AM, Daniel Siegmann <
dsiegm...@securityscorecard.io> wrote:
> I think it's fine to read animal types locally because there are only 70
> of them. It's just that you want to execute the Spark actions
I think it's fine to read animal types locally because there are only 70 of
them. It's just that you want to execute the Spark actions in parallel. The
easiest way to do that is to have only a single action.
Instead of grabbing the result right away, I would just add a column for
the animal type a
First of all, if you want to read a txt file in Spark, you should use
sc.textFile, because you are using "Source.fromFile", so you are reading it
with Scala standard api, so it will be read sequentially.
Furthermore you are going to need create a schema if you want to use
dataframes.
El 5/10/2016
Right now, I am doing it like below,
import scala.io.Source
val animalsFile = "/home/ajay/dataset/animal_types.txt"
val animalTypes = Source.fromFile(animalsFile).getLines.toArray
for ( anmtyp <- animalTypes ) {
val distinctAnmTypCount = sqlContext.sql("select
count(distinct("+anmtyp+")) f
Hi Everyone,
I have a use-case where I have two Dataframes like below,
1) First Dataframe(DF1) contains,
*ANIMALS*
Mammals
Birds
Fish
Reptiles
Amphibians
2) Second Dataframe(DF2) contains,
*ID, Mammals, Birds, Fish, Reptiles, Amphibians*
1, Dogs, Eagle, Goldfish,