How can i merge multiple rows to one row in sparksql or hivesql?

2017-05-15 Thread Jone Zhang
For example Data1(has 1 billion records) user_id1 feature1 user_id1 feature2 Data2(has 1 billion records) user_id1 feature3 Data3(has 1 billion records) user_id1 feature4 user_id1 feature5 ... user_id1 feature100 I want to get the result as follow user_id1 feature1 feature2 feature3

Re: operation log is missing when using hive.execution.engine=mr

2017-05-15 Thread Jie Zhang
Hi, Peter, Exactly! By setting hive.async.log.enabled=false and restart hive server 2, the MR job progress is printed in the operation log. Thanks very much for your help! Jessica On Mon, May 15, 2017 at 10:56 AM, Peter Vary wrote: > Hi Jessica, > > Is it possible that you

operation log is missing when using hive.execution.engine=mr

2017-05-15 Thread Jie Zhang
Hi, My team just upgrade Hive from 0.14.0 to 2.1.1. The operation log is missing when running the query, no query progress is printed. The only log printed in operation log is "WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different

Re: operation log is missing when using hive.execution.engine=mr

2017-05-15 Thread Peter Vary
Hi Jessica, Is it possible that you are effected by this? https://issues.apache.org/jira/browse/HIVE-16061 Thanks, Peter 2017. máj. 15. 19:44 ezt írta ("Jie Zhang" ): Hi, My team just upgrade Hive from 0.14.0 to 2.1.1. The operation log is missing when running the

Re: How can i merge multiple rows to one row in sparksql or hivesql?

2017-05-15 Thread Edward Capriolo
Here is a similar but not exact way I did something similar to what you did. I had two data files in different formats the different columns needed to be different features. I wanted to feed them into spark's:

Hive handling of ingested data when source column changes size or new column added

2017-05-15 Thread Mich Talebzadeh
Assuming we are ingesting into Hive table from an RDBMS Oracle table. This is done through a daily mechanism. My conclusion is this. 1. The source column has moved from VARCHA2(50) to CARCHAR2(100). As I know this should not matter in Hive as every VARCHAR is stored as String in Hive.

Re: How can i merge multiple rows to one row in sparksql or hivesql?

2017-05-15 Thread goun na
Hi, Jone Zhang 1. Hive UDF You might need collect_set or collect_list (to eliminate duplication), but make sure reduce its cardinality before applying UDFs as it can cause problems while handling 1 billion records. Union dataset 1,2,3 -> group by user_id1 -> collect_set (feature column) would

Re: How can i merge multiple rows to one row in sparksql or hivesql?

2017-05-15 Thread goun na
I mentioned it opposite. collect_list generates duplicated results. 2017-05-16 0:50 GMT+09:00 goun na : > Hi, Jone Zhang > > 1. Hive UDF > You might need collect_set or collect_list (to eliminate duplication), but > make sure reduce its cardinality before applying UDFs as it