I second that. Many thanks Mich for your reply.
Regards 

    On Monday, 4 January 2016, 10:58, "Singh, Abhijeet" 
<absi...@informatica.com> wrote:
 

 #yiv5347372295 #yiv5347372295 -- _filtered #yiv5347372295 
{font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv5347372295 
{font-family:Tahoma;panose-1:2 11 6 4 3 5 4 4 2 4;} _filtered #yiv5347372295 
{}#yiv5347372295 #yiv5347372295 p.yiv5347372295MsoNormal, #yiv5347372295 
li.yiv5347372295MsoNormal, #yiv5347372295 div.yiv5347372295MsoNormal 
{margin:0in;margin-bottom:.0001pt;font-size:12.0pt;}#yiv5347372295 a:link, 
#yiv5347372295 span.yiv5347372295MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv5347372295 a:visited, #yiv5347372295 
span.yiv5347372295MsoHyperlinkFollowed 
{color:purple;text-decoration:underline;}#yiv5347372295 
p.yiv5347372295MsoAcetate, #yiv5347372295 li.yiv5347372295MsoAcetate, 
#yiv5347372295 div.yiv5347372295MsoAcetate 
{margin:0in;margin-bottom:.0001pt;font-size:8.0pt;}#yiv5347372295 
p.yiv5347372295MsoListParagraph, #yiv5347372295 
li.yiv5347372295MsoListParagraph, #yiv5347372295 
div.yiv5347372295MsoListParagraph 
{margin-top:0in;margin-right:0in;margin-bottom:0in;margin-left:.5in;margin-bottom:.0001pt;font-size:11.0pt;}#yiv5347372295
 p.yiv5347372295msonormal0, #yiv5347372295 li.yiv5347372295msonormal0, 
#yiv5347372295 div.yiv5347372295msonormal0 
{margin-right:0in;margin-left:0in;font-size:12.0pt;}#yiv5347372295 
span.yiv5347372295EmailStyle19 
{color:windowtext;font-weight:normal;font-style:normal;text-decoration:none 
none;}#yiv5347372295 span.yiv5347372295BalloonTextChar {}#yiv5347372295 
span.yiv5347372295EmailStyle22 {color:#1F497D;}#yiv5347372295 
.yiv5347372295MsoChpDefault {font-size:10.0pt;} _filtered #yiv5347372295 
{margin:1.0in 1.0in 1.0in 1.0in;}#yiv5347372295 div.yiv5347372295WordSection1 
{}#yiv5347372295 Very well answered by Mich.    Thanks Mich !!    From: Mich 
Talebzadeh [mailto:m...@peridale.co.uk]
Sent: Sunday, January 03, 2016 8:35 PM
To: user@hive.apache.org; 'Ashok Kumar'
Subject: RE: Immutable data in Hive    Hi Ashok.    I will have a go at this on 
top of Alan’s very valuable clarification.    Extraction, Transformation and 
Load  (ETL) is a very common method in Data Warehousing (DW)and Business 
Analytics projects and can be performed by custom programming like writing 
shell scripts, JAVA. .NET tools or combination of all to get the data from 
internal or external sources and put them in DW.    In general only data of 
valueends up in DW. What this mean is that in say in Banking environment you 
collect and feed (Extract) data into a staging area (in relational term often 
staging tables or the so called global temporary tables that are cleared daily 
for the next cycle in a staging database), prune it from unwanted data, do some 
manipulation (Transformation) (often happens into another set of staging 
tables) and finally Load it into target tables in a Data Warehouse. The 
analysts then use appropriate tools like Tableau to look at macroscopic trend 
in the data. Remember a Data Warehouse is still a relational database most 
probably a columnar implementation of relational model like SAP Sybase IQ.      
 There are many examples of DW repositories used for Business Intelligence (BI, 
another fancy term for Analytics)  such as working out global trading 
positioning (I did one of these by bolting Oracle TimesTen IMDB to Oracle DW 
for fast extraction) or data gathered from algorithmic trading using Complex 
Event Processing. Obviously although DW store larger amount of data (large 
being a relative term) and have impressive compression like Sybase IQ (every 
column is stored as an index so it is far more effective to do columnar 
compression (all data being the same type as opposed to row compression in OLTP 
databases)), they still require additional space, SAN storage and expensive 
horizontal scaling (adding another multi-plex requires additional license).    
ELT (Extraction, Load and Transform) is a similar concept used in Big Data 
World. The fundamental difference being that it is not just confined to data 
deemed to be of specific value, meaning you know what you are looking for in 
advance. In Hadoop one can store everything from data coming from structured 
data (transactional databases) and unstructured data (data coming from 
internet, excel sheets, email, logs and others). This means that you can store 
potentially all data to be exploited later, Hadoop echo system provides that 
flexibility by means of horizontal scaling on cheap commodity disks (AKA JBOD) 
and lack of licensing restrictions result in reducing Total Cost of Ownership 
(TCO) considerably.  In summary you (E))xtract and (L)oad all data as is (don’t 
care whether that data is exactly what you want) into HDFS and then you do 
(T)ransformation later through Schema on Read (you decide at time of 
exploration your data needs).    HDFS is great for storing large amount of data 
but on top of that you will need all tools like Hive, Spark, Cassandra and 
others to explore your data lake.       HTH       Mich Talebzadeh    Sybase ASE 
15 Gold Medal Award 2008 A Winning Strategy: Running the most Critical 
Financial Data on ASE 15 
http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf 
Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7. co-author"Sybase Transact SQL Guidelines Best 
Practices", ISBN 978-0-9759693-0-4 Publications due shortly: Complex Event 
Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8 Oracle and 
Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly 
   http://talebzadehmich.wordpress.com/    NOTE: The information in this email 
is proprietary and confidential. This message is for the designated recipient 
only, if you are not the intended recipient, you should destroy it immediately. 
Any information in this message shall not be understood as given or endorsed by 
Peridale Technology Ltd, its subsidiaries or their employees, unless expressly 
so stated. It is the responsibility of the recipient to ensure that this email 
is virus free, therefore neither Peridale Ltd, its subsidiaries nor their 
employees accept any responsibility.    From: Ashok Kumar 
[mailto:ashok34...@yahoo.com]
Sent: 03 January 2016 11:03
To: user@hive.apache.org; Ashok Kumar <ashok34...@yahoo.com>
Subject: Re: Immutable data in Hive    Any comments on ELT will be greatly 
appreciated gurus.    With warmest greetings    On Wednesday, 30 December 2015, 
18:20, Ashok Kumar <ashok34...@yahoo.com> wrote:    Tank you sir,  very 
helpful.   Could you also briefly describe from your experience  the major 
differences between traditional ETL in DW and ELT in Hive?  Why there is 
emphasis to take data from traditional transactional databases into Hive table 
with the same format and do the transform in Hive after. Is it because Hive is 
meant to be efficient in data transformation?   Regards         On Wednesday, 
30 December 2015, 18:00, Alan Gates <alanfga...@gmail.com> wrote:    
Traditionally data in Hive was write once (insert) read many.  You could append 
to tables and partitions, add new partitions, etc.  You could remove data by 
dropping tables or partitions.  But there was no updates of data or deletes of 
particular rows.  This was what was meant by immutable.  Hive was originally 
done this way because it was based on MapReduce and HDFS and these were the 
natural semantics given those underlying systems.

For many use cases (e.g. ETL) this is sufficient, and the vast majority of 
people still run Hive this way.

We added transactions and updates and deletes to Hive because some use cases 
require these features.  Hive is being used more and more as a data warehouse, 
and while updates and deletes are less common there they are still required 
(slow changing dimensions, fixing wrong data, deleting records for compliance, 
etc.)  Also streaming data into warehouses from transactional systems is a 
common use case.

Alan.

 
   Ashok Kumar December 29, 2015 at 14:59 Hi,    Can someone please clarify 
what  "immutable data" in Hive means?    I have been told that data in Hive 
is/should be immutable but in that case why we need transactional tables in 
Hive that allow updates to data.    thanks and greetings             
      

  

Reply via email to