Re: Adding a new column to a table and updating it

2015-12-11 Thread Eugene Koifman
Schema evolution is not supported for Acid tables. https://issues.apache.org/jira/browse/HIVE-11981 fixes it but hasn't been released yet. There is no quick way to recover data. You could write a script to use ORC FileDump utility to look at actual files in the table to group them into sets

Re: trying to figure out number of MR jobs from explain output

2015-12-11 Thread Nicholas Hakobian
You can't find out definitively because it is going to depend on the nature of the data being processed, especially when it comes to mapjoins. If the output of one stage is small enough for it to mapjoin, parts of a stage can be skipped as the whole dataset is on every node. I'm sure there are

RE: Adding a new column to a table and updating it

2015-12-11 Thread Mich Talebzadeh
Thanks Eugene So basically as I understand when a column can be added to an already table. 1.The metadata for the underlying table will be updated 2.The new column will by default have null value 3.The existing rows cannot have new column updated to a non null value 4.

Connection between TempletonJob and Worker Nodes remains in FIN_WAIT_2 state for long time

2015-12-11 Thread mahender bigdata
Hi, We have submitted too many jobs to webhcat (templeton) reason is our HQL has multiple hive statements,each hive statement is submitted as a job causing too many job, after some times all the submitted job are in pending state. later after waiting for 2 hrs, all the pending jobs got

trying to figure out number of MR jobs from explain output

2015-12-11 Thread Ophir Etzion
Hi, I've been trying to figure out how to know the number of MR jobs that will be ran for a hive query using the EXPLAIN output. I haven't got to a consistent method to knowing that. for example (in one of my queries, ctas query): STAGE DEPENDENCIES: Stage-1 is a root stage Stage-7 depends

Re: Adding a new column to a table and updating it

2015-12-11 Thread Eugene Koifman
yes, #6 is the way to handle it for now. As far as specific conditions, let me explain it differently. Let's limit this to a non-partitioned table. When you read read an Acid table, the code needs to merge matching bucket files from different delta directories to materialize the snapshot. When