Andrew,
you have pretty much consolidated my entire experience, please give a
presentation in a meetup on this, and send across the links :)
Regards,
Gourav
On Wed, Jul 20, 2016 at 4:35 AM, Andrew Ehrlich wrote:
> Try:
>
> - filtering down the data as soon as possible in the job, dropping col
Thanks a lot for your kind help.
On Wednesday, July 20, 2016 11:35 AM, Andrew Ehrlich
wrote:
Try:
- filtering down the data as soon as possible in the job, dropping columns you
don’t need.- processing fewer partitions of the hive tables at a time- caching
frequently accessed data, fo
Try:
- filtering down the data as soon as possible in the job, dropping columns you
don’t need.
- processing fewer partitions of the hive tables at a time
- caching frequently accessed data, for example dimension tables, lookup
tables, or other datasets that are repeatedly accessed
- using the S
Thanks a lot for your reply .
In effect , here we tried to run the sql on kettle, hive and spark hive (by
HiveContext) respectively, the job seems frozen to finish to run .
In the 6 tables , need to respectively read the different columns in different
tables for specific information , then do so
Hi,
What about the network (bandwidth) between hive and spark?
Does it run in Hive before then you move to Spark?
Because It's complex you can use something like EXPLAIN command to show what
going on.
> On Jul 18, 2016, at 5:20 PM, Zhiliang Zhu wrote:
>
> the sql logic in the program is v
the sql logic in the program is very much complex , so do not describe the
detailed codes here .
On Monday, July 18, 2016 6:04 PM, Zhiliang Zhu
wrote:
Hi All,
Here we have one application, it needs to extract different columns from 6 hive
tables, and then does some easy calculati
Hi All,
Here we have one application, it needs to extract different columns from 6 hive
tables, and then does some easy calculation, there is around 100,000 number of
rows in each table,finally need to output another table or file (with format of
consistent columns) .
However, after lots of d