Re: Filter one dataset based on values from another

2018-05-01 Thread lsn24
I don't think inner join will solve my problem.

*For each row in* paramsDataset, I need to filter mydataset. And then I need
to run a bunch of calculation on filtered myDataset.

Say for example paramsDataset has three employee age ranges . Eg:
20-30,30-50, 50-60 and regions USA,Canada.

myDataset has all employees information for three years. Like the days a
person came to work , took day off etc.

I need to calculate the average number of days employee worked per age range
for different regions. Average day off per age range etc.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Filter one dataset based on values from another

2018-05-01 Thread Lalwani, Jayesh
What columns do you want to filter myDataSet on? What are the corresponding 
columns in paramsDataSet?

You can easily do what you want using a inner  join. For example, if tempview 
and paramsview both have a column, say employeeID. You can do this with the SQl

sparkSession.sql("Select * from tempview inner join paramsview on 
tempview.employeeId = paramsview.employeeId") 

On 5/1/18, 12:03 AM, "lsn24"  wrote:

Hi,
  I have one  dataset with parameters and another with data that needs to
apply/ filter based on the first dataset (Parameter dataset).

*Scenario is as follows:*

For each row in parameter dataset, I need to apply the parameter row to
the second dataset.I will end up having multiple dataset. for each second
dataset i need to run  a bunch of calculation.

How can I achieve this in spark?

*Pseudo code for better understanding:*

Dataset paramsDataset = sparkSession.sql("select * from
paramsview");

Dataset myDataset = sparkSession.sql("select * from tempview");


Question: For each row in paramsDataset, I need to filter myDataset and run
some calculations on it. Is it possible to do that ? if not whats the best
way to solve it?

Thanks




--
Sent from: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dspark-2Duser-2Dlist.1001560.n3.nabble.com_&d=DwICAg&c=pLULRYW__RtkwsQUPxJVDGboCTdgji3AcHNJU0BpTJE&r=F2RNeGILvLdBxn7RJ4effes_QFIiEsoVM2rPi9qX1DKow5HQSjq0_WhIW109SXQ4&m=2DBXMR9Vazi5EAA7gtp78AhvgGj1xwkacIgDWUOOOS4&s=baasFvkvrjKfQoZTws7KEWp24oBkrLJWvUz1gV5UjFQ&e=

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org





The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Filter one dataset based on values from another

2018-04-30 Thread lsn24
Hi,
  I have one  dataset with parameters and another with data that needs to
apply/ filter based on the first dataset (Parameter dataset).

*Scenario is as follows:*

For each row in parameter dataset, I need to apply the parameter row to
the second dataset.I will end up having multiple dataset. for each second
dataset i need to run  a bunch of calculation.

How can I achieve this in spark?

*Pseudo code for better understanding:*

Dataset paramsDataset = sparkSession.sql("select * from
paramsview");

Dataset myDataset = sparkSession.sql("select * from tempview");


Question: For each row in paramsDataset, I need to filter myDataset and run
some calculations on it. Is it possible to do that ? if not whats the best
way to solve it?

Thanks




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org