Schema evolution is not supported for Acid tables.
https://issues.apache.org/jira/browse/HIVE-11981 fixes it but hasn't been
released yet.
There is no quick way to recover data.
You could write a script to use ORC FileDump utility to look at actual files in
the table
to group them into sets
You can't find out definitively because it is going to depend on the
nature of the data being processed, especially when it comes to
mapjoins. If the output of one stage is small enough for it to
mapjoin, parts of a stage can be skipped as the whole dataset is on
every node.
I'm sure there are
Thanks Eugene
So basically as I understand when a column can be added to an already
table.
1.The metadata for the underlying table will be updated
2.The new column will by default have null value
3.The existing rows cannot have new column updated to a non null value
4.
Hi,
We have submitted too many jobs to webhcat (templeton) reason is our HQL
has multiple hive statements,each hive statement is submitted as a job
causing too many job, after some times all the submitted job are in
pending state. later after waiting for 2 hrs, all the pending jobs got
Hi,
I've been trying to figure out how to know the number of MR jobs that will
be ran for a hive query using the EXPLAIN output.
I haven't got to a consistent method to knowing that.
for example (in one of my queries, ctas query):
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-7 depends
yes, #6 is the way to handle it for now.
As far as specific conditions, let me explain it differently.
Let's limit this to a non-partitioned table.
When you read read an Acid table, the code needs to merge matching bucket files
from different delta directories to materialize the snapshot.
When