large data ingestion with jdbc versus putting data into hdfs with hive to ingest

2016-11-07 Thread Ashok Kumar
hi, what are the drawbacks of ingesting large quantities of data through jdbc connection in Hive versus placing data directly into HDFS for Hive to incorporate? thanks

Re: Happy Diwali to those forum members who celebrate this great festival

2016-10-30 Thread Ashok Kumar
You are very kind Sir On Sunday, 30 October 2016, 16:42, Devopam Mittra wrote: +1 Thanks and regards Devopam On 30 Oct 2016 9:37 pm, "Mich Talebzadeh" wrote: Enjoy the festive season. Regards, Dr Mich Talebzadeh LinkedIn  

Re: Presentation in London: Running Spark on Hive or Hive on Spark

2016-07-19 Thread Ashok Kumar
Thanks Mich looking forward to it :) On Tuesday, 19 July 2016, 19:13, Mich Talebzadeh wrote: Hi all, This will be in London tomorrow Wednesday 20th July starting at 18:00 hour for refreshments and kick off at 18:30, 5 minutes walk from Canary Wharf Station,

Re: Hive on TEZ + LLAP

2016-07-15 Thread Ashok Kumar
ay, July 15, 2016 at 8:36 AM To: "user@hive.apache.org" <user@hive.apache.org> Subject: Re: Hive on TEZ + LLAP   I would recommend a distribution such as Hortonworks were everything is already configured. As far as I know llap is currently not part of any distribution. On 15 Jul 20

Hive on TEZ + LLAP

2016-07-15 Thread Ashok Kumar
Hi, Has anyone managed to make Hive work with Tez + LLAP as the query engine in place of Map-reduce please? If you configured it yourself which version of Tez and LLAP work with Hive 2. Do I need to build Tez from source for example Thanks

Fast database with writes per second and horizontal scaling

2016-07-11 Thread Ashok Kumar
Hi Gurus, Advice appreciated from Hive gurus. My colleague has been using Cassandra. However, he says it is too slow and not user friendly/MongodDB as a doc databases is pretty neat but not fast enough May main concern is fast writes per second and good scaling. Hive on Spark or Tez? How about

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Ashok Kumar
Hi Mich, Your recent presentation in London on this topic "Running Spark on Hive or Hive on Spark" Have you made any more interesting findings that you like to bring up? If Hive is offering both Spark and Tez in addition to MR, what stopping one not to use Spark? I still don't get why TEZ + LLAP

Re: Presentation in London: Running Spark on Hive or Hive on Spark

2016-07-07 Thread Ashok Kumar
Thanks. Will this presentation recorded as well? Regards On Wednesday, 6 July 2016, 22:38, Mich Talebzadeh wrote: Dear forum members I will be presenting on the topic of "Running Spark on Hive or Hive on Spark, your mileage varies" in Future of Data: London 

latest version of Spark to work OK as Hive engine

2016-07-02 Thread Ashok Kumar
Hi, Looking at this presentation Hive on Spark is Blazing Fast .. Which latest version of Spark can run as an engine for Hive please? Thanks P.S. I am aware of  Hive on TEZ but that is not what I am interested here please Warmest regards

last stats time on table columns

2016-06-16 Thread Ashok Kumar
Greeting gurus, When I use ANALYZE TABLE COMPUTE STATISTICS for COLUMNS, Where can I get the last stats time. DESC FORMATTED does not show it thanking you 

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-05-23 Thread Ashok Kumar
Hi Dr Mich, This is very good news. I will be interested to know how Hive engages with Spark as an engine. What Spark processes are used to make this work?  Thanking you On Monday, 23 May 2016, 19:01, Mich Talebzadeh wrote: Have a look at this thread Dr Mich

Copying all Hive tables from Prod to UAT

2016-04-08 Thread Ashok Kumar
Hi, Anyone has suggestions how to create and copy Hive and Spark tables from Production to UAT. One way would be to copy table data to external files and then move the external files to a local target directory and populate the tables in target Hive with data. Is there an easier way of doing

Updating column in table throws error

2016-03-06 Thread Ashok Kumar
Hi gurus, I have an ORC table bucketed on invoicenumber with "transactional"="true" I am trying to update invoicenumber column used for bucketing this table but it comes back with Error: Error while compiling statement: FAILED: SemanticException [Error 10302]: Updating values of bucketing

Re: Hive and Impala

2016-03-01 Thread Ashok Kumar
Dr Mitch, My two cents here. I don't have direct experience of Impala but in my humble opinion I share your views that Hive provides the best metastore of all Big Data systems. Looking around almost every product in one form and shape use Hive code somewhere. My colleagues inform me that Hive

Re: Hive optimizer

2016-02-04 Thread Ashok Kumar
wrote: Its both.Some of the optimizations are rule based and some are cost based. John From: Ashok Kumar <ashok34...@yahoo.com> Reply-To: "user@hive.apache.org" <user@hive.apache.org>, Ashok Kumar <ashok34...@yahoo.com> Date: Wednesday, February 3, 2016 at 11:45 AM To: U

Hive optimizer

2016-02-03 Thread Ashok Kumar
  Hi, Is Hive optimizer a cost based Optimizer (CBO) or a rule based optimizer (CBO) or none of them. thanks

Re: Importing Oracle data into Hive

2016-01-31 Thread Ashok Kumar
ogy Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility.  From: Ashok Kumar [mailto:ashok

Importing Oracle data into Hive

2016-01-31 Thread Ashok Kumar
  Hi, What is the easiest method of importing data from an Oracle 11g table to Hive please? This will be a weekly periodic job. The source table has 20 million rows. I am running Hive 1.2.1 regards

Re: Importing Oracle data into Hive

2016-01-31 Thread Ashok Kumar
Thanks, Can sqoop create this table as ORC in Hive? On Sunday, 31 January 2016, 13:13, Ashok Kumar <ashok34...@yahoo.com> wrote: Thanks. Can sqoop create this table as ORC in Hive? On Sunday, 31 January 2016, 13:11, Nitin Pawar <nitinpawar...@gmail.com> wrote:

Re: ORC files and statistics

2016-01-19 Thread Ashok Kumar
Thanks Owen, I got a bit confused comparing ORC with what I know about indexes in relational databases. Still need to understand it a bit better. Regards From: Owen O'Malley [mailto:omal...@apache.org] Sent: 19 January 2016 17:57 To: user@hive.apache.org; Ashok Kumar <ashok34...@yahoo.com&

Re: ORC files and statistics

2016-01-19 Thread Ashok Kumar
der can jump straight to the beginning of the row group. The reader takes a SearchArgument (eg. age > 100)  that limits which rows are required for the query and can avoid reading an entire file, or at least sections of the file. .. Owen On Tue, Jan 19, 2016 at 7:50 AM, Ashok Kumar <ashok34..

ORC files and statistics

2016-01-19 Thread Ashok Kumar
Hi, I have read some notes on ORC files in Hive and indexes. The document describes in the indexes but makes reference to statistics Indexes |   | |   | |   |   |   |   |   | | IndexesIndexes ORC provides three level of indexes within each file: file level - statistics about the values in each

eiquivalent to identity column in Hive

2016-01-16 Thread Ashok Kumar
Hi, Is there an equivalent to Microsoft IDENTITY column in Hive please. Thanks  and regards

foreign keys in Hive

2016-01-10 Thread Ashok Kumar
hi, what is the equivalent to foreign keys in Hive? Thanks

Re: Immutable data in Hive

2016-01-04 Thread Ashok Kumar
.0in 1.0in 1.0in;}#yiv5347372295 div.yiv5347372295WordSection1 {}#yiv5347372295 Very well answered by Mich.   Thanks Mich !!   From: Mich Talebzadeh [mailto:m...@peridale.co.uk] Sent: Sunday, January 03, 2016 8:35 PM To: user@hive.apache.org; 'Ashok Kumar' Subject: RE: Immutable data in Hive   Hi Ashok

Re: Immutable data in Hive

2015-12-30 Thread Ashok Kumar
while updates and deletes are less common there they are still required (slow changing dimensions, fixing wrong data, deleting records for compliance, etc.)  Also streaming data into warehouses from transactional systems is a common use case. Alan. Ashok Kumar December 29, 2015 at 14:

Immutable data in Hive

2015-12-29 Thread Ashok Kumar
Hi, Can someone please clarify what  "immutable data" in Hive means? I have been told that data in Hive is/should be immutable but in that case why we need transactional tables in Hive that allow updates to data. thanks and greetings

Difference between ORC and RC files

2015-12-21 Thread Ashok Kumar
Hi Gurus, I am trying to understand the advantages that ORC file format offers over RC. I have read the existing documents but I still don't seem to grasp the main differences. Can someone explain to me as a user where ORC scores when compared to RC. What I like to know is mainly the

Re: The advantages of Hive/Hadoop comnpared to Data Warehouse

2015-12-18 Thread Ashok Kumar
18 Dec 2015, at 22:01, Ashok Kumar <ashok34...@yahoo.com> wrote: Gurus, Some analysts keep asking me the advantages of having Hive tables when the star schema in Data Warehouse (DW) does the same. For example if you have fact and dimensions table in DW and just import them into Hive via

The advantages of Hive/Hadoop comnpared to Data Warehouse

2015-12-18 Thread Ashok Kumar
Gurus, Some analysts keep asking me the advantages of having Hive tables when the star schema in Data Warehouse (DW) does the same. For example if you have fact and dimensions table in DW and just import them into Hive via a say SQOOP, what are we going to gain. I keep telling them storage

Re: The advantages of Hive/Hadoop comnpared to Data Warehouse

2015-12-18 Thread Ashok Kumar
rized to receive for the recipient), please contact the sender by reply email and delete all copies of this message.Please click here for Company Registration Information. | | From: Ashok Kumar <ashok34...@yahoo.com> Reply-To: User <user@hive.apache.org>, Ashok Kumar <ashok34...@yahoo

Fw: Managed to make Hive run on Spark engine

2015-12-07 Thread Ashok Kumar
This is great news sir. It shows perseverance pays at last. Can you inform us when the write-up is ready so I can set it up as well please. I know a bit about the advantages of having Hive using Spark engine. However, the general question I have is when one should use Hive on spark as opposed

Best practices for monitoring hive

2015-11-13 Thread Ashok Kumar
Hi, I would like to know best practices to monitor the health and performance of Hive and hive server, trouble shooting and catching errors etc. to be clear we do not use any bespoke monitoring tool and keen on developing our own in house tools to be integrated into general monitoring tools to

exporting and importing a schema/database from Hive Test to Hive DEV

2015-11-11 Thread Ashok Kumar
Hi gurus, What is the easiest way of exporting one database in Hive in Test and importing it to say DEV database in another instance. How metastore at target will handle this please? Thanks 

Re: Hive and HBase

2015-11-10 Thread Ashok Kumar
mber 2015, 13:03, Binglin Chang <decst...@gmail.com> wrote: Hive transparently translates queries into MapReduce jobs that are executed in HBase I think this is not correct, are you sure it is from some book?  On Tue, Nov 10, 2015 at 6:56 PM, Ashok Kumar <ashok34...@yahoo.com>

DML operations and transactions in Hive

2015-11-06 Thread Ashok Kumar
Hi, I would like to understand a bit more about insert/update/select operations in Hive. I believe ORC table format offers the best performance and concurrency. Is ORC the best format for DML operations. What is the granularity of locks? Is it partition or row. Also how ACID properties

Re: clarification please

2015-10-29 Thread Ashok Kumar
Thank you sir. Very helpful On Thursday, 29 October 2015, 15:22, Alan Gates <alanfga...@gmail.com> wrote: Ashok Kumar October 28, 2015 at 22:43 hi gurus, kindly clarify the following please - Hive currently does not support indexes or indexes are no

clarification please

2015-10-28 Thread Ashok Kumar
hi gurus, kindly clarify the following please - Hive currently does not support indexes or indexes are not used in the query - The lowest granularity for concurrency is partition. If table is partitioned, then partition will be lucked in DML operation - What is the best file format

downloading RDBMS table data to Hive with Sqoop import

2015-05-05 Thread Ashok Kumar
Hi gurus, I can use Sqoop import to get RDBMS data say Oracle to Hive first and then use incremental append for new rows with PK and last value. However, how do you account for updates and deletes with Sqoop without full load of table from RDBMS to Hive? Thanks

Hive and Impala

2015-04-27 Thread Ashok Kumar
Hi gurus, Kindly help me understand the advantage that Impala has over Hive. I read a note that Impala does not use MapReduce engine and is therefore very fast for queries compared to Hive. However, Hive as I understand is widely used everywhere! Thank you

Re: partition and bucket

2015-04-14 Thread Ashok Kumar
accept any responsibility.  From: Ashok Kumar [mailto:ashok34...@yahoo.com] Sent: 10 April 2015 17:46 To: user@hive.apache.org Subject: partition and bucket   | Greeting all, Glad to join the user group. I am from DBA background Oracle/Sybase/MSSQL. I would like to understand partition