Assuming we are ingesting into Hive table from an RDBMS Oracle table.
This is done through a daily mechanism.
My conclusion is this.
1. The source column has moved from VARCHA2(50) to CARCHAR2(100). As I
know this should not matter in Hive as every VARCHAR is stored as String in
Hive.
Hi, Peter,
Exactly! By setting hive.async.log.enabled=false and restart hive server 2,
the MR job progress is printed in the operation log. Thanks very much for
your help!
Jessica
On Mon, May 15, 2017 at 10:56 AM, Peter Vary wrote:
> Hi Jessica,
>
> Is it possible that you are effected by this
Hi Jessica,
Is it possible that you are effected by this?
https://issues.apache.org/jira/browse/HIVE-16061
Thanks,
Peter
2017. máj. 15. 19:44 ezt írta ("Jie Zhang" ):
Hi,
My team just upgrade Hive from 0.14.0 to 2.1.1. The operation log is
missing when running the query, no query progress i
Hi,
My team just upgrade Hive from 0.14.0 to 2.1.1. The operation log is
missing when running the query, no query progress is printed. The only log
printed in operation log is
"WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in
the future versions. Consider using a different e
Here is a similar but not exact way I did something similar to what you
did. I had two data files in different formats the different columns needed
to be different features. I wanted to feed them into spark's:
https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Frequent_Pattern_Mining/The_FP-
I mentioned it opposite. collect_list generates duplicated results.
2017-05-16 0:50 GMT+09:00 goun na :
> Hi, Jone Zhang
>
> 1. Hive UDF
> You might need collect_set or collect_list (to eliminate duplication), but
> make sure reduce its cardinality before applying UDFs as it can cause
> problems
Hi, Jone Zhang
1. Hive UDF
You might need collect_set or collect_list (to eliminate duplication), but
make sure reduce its cardinality before applying UDFs as it can cause
problems while handling 1 billion records. Union dataset 1,2,3 -> group by
user_id1 -> collect_set (feature column) would work
For example
Data1(has 1 billion records)
user_id1 feature1
user_id1 feature2
Data2(has 1 billion records)
user_id1 feature3
Data3(has 1 billion records)
user_id1 feature4
user_id1 feature5
...
user_id1 feature100
I want to get the result as follow
user_id1 feature1 feature2 feature3 featu