Re: compaction issue: Compaction cannot compact above this txnid

2020-06-02 Thread David Morin
cess/content/understanding-administering-compactions.html > > Also if everything else fails, you can still issue the ALTER TABLE command > periodically using crontab. Running extra compaction will not hurt that > much. > > Thanks, > Peter > > On Jun 2, 2020, at 14:25, Da

Re: compaction issue: Compaction cannot compact above this txnid

2020-06-02 Thread David Morin
. 2 juin 2020 à 12:57, Peter Vary a écrit : > Hi David, > > You do not really need to run compaction every time. > Is it possible to wait for the compaction to start automatically next time? > > Thanks, > Peter > > On Jun 2, 2020, at 12:51, David Morin wrote: > >

Re: compaction issue: Compaction cannot compact above this txnid

2020-06-02 Thread David Morin
This looks very confusing when looking at > the logs." > > Thanks, > Peter > > On Jun 2, 2020, at 11:44, David Morin wrote: > > I don't get it. > The transaction id in the error message "No delta files or original files > found to compact in hdfs://... with mi

Re: compaction issue: Compaction cannot compact above this txnid

2020-06-02 Thread David Morin
compaction for the current database/table On 2020/06/01 20:13:08, David Morin wrote: > Hi, > > I have a compaction issue on my cluster. When I force a compaction (major) on > one table I get this error in Metastore logs: > > 2020-06-01 19:49:35,512 ERROR [-78]: compactor.Compacto

compaction issue: Compaction cannot compact above this txnid

2020-06-01 Thread David Morin
Hi, I have a compaction issue on my cluster. When I force a compaction (major) on one table I get this error in Metastore logs: 2020-06-01 19:49:35,512 ERROR [-78]: compactor.CompactorMR (CompactorMR.java:run(264)) - No delta files or original files found to compact in

Re: ORC: duplicate record - rowid meaning ?

2020-02-25 Thread David Morin
Le jeu. 6 févr. 2020 à 12:12, David Morin a écrit : > ok, Peter > No problem. Thx > I'll keep you in touch > > On 2020/02/06 09:42:39, Peter Vary wrote: > > Hi David, > > > > I more familiar with ACID v2 :( > > What I would do is to run an update operat

Re: ORC: duplicate record - rowid meaning ?

2020-02-06 Thread David Morin
ice to hear back from you if you found something. > > Thanks, > Peter > > > On Feb 5, 2020, at 16:55, David Morin wrote: > > > > Hello, > > > > Thanks. > > In fact I use HDP 2.6.5 and previous Orc version with transactionid for > > exampl

Re: ORC: duplicate record - rowid meaning ?

2020-02-05 Thread David Morin
ws. Only insert and delete. So update > is handled as delete (old) row, insert (new/independent) row. > The delete is stored in the delete delta directories., and the file do not > have to contain the {row} struct at the end. > > Hope this helps, > Peter > > > On Feb 5

Re: ORC: duplicate record - rowid meaning ?

2020-02-05 Thread David Morin
73_0199073_ hdfs:///delta_0199073_0199073_0002 And the first one contains updates (operation:1) and the second one, inserts (operation:0) Thanks for your help David On 2019/12/01 16:57:08, David Morin wrote: > Hi Peter, > > At the moment I have a pipeline based on Flink to wri

compaction and stripes size

2019-12-08 Thread David Morin
Hi, When major compactions have been performed on Hive tables based on the Orc format do we have Orc stripes that have been rewritten ? I know that records have not been updated (ignored for some of ones but not updated) but concerning Stripes size, do major compactions impact these ones ? For

Re: ORC: duplicate record - rowid meaning ?

2019-12-01 Thread David Morin
your question below: Yes, the files should be ordered by: > originalTransacion, bucket, rowId triple, otherwise you will get wrong > results. > > Thanks, > Peter > > > On Nov 19, 2019, at 13:30, David Morin wrote: > > > > here after more detail

Re: ORC: duplicate record - rowid meaning ?

2019-11-19 Thread David Morin
tid":3,"rowid":0} | *5218* | | {"transactionid":11365,"bucketid":3,"rowid":1} | *5216* | | {"transactionid":11369,"bucketid":3,"rowid":1} | *5216* | | {"transactionid":11369,"bucketid":

ORC: duplicate record - rowid meaning ?

2019-11-18 Thread David Morin
Hello, I'm trying to understand the purpose of the rowid column inside ORC delta file {"transactionid":11359,"bucketid":5,"*rowid*":0} Orc view: {"operation":0,"originalTransaction":11359,"bucket":5,"*rowId* ":0,"currentTransaction":11359,"row":...} I use HDP 2.6 => Hive 2 If I want to be

Re: Locks with ACID: need some clarifications

2019-09-09 Thread David Morin
> > Alan. > > On Mon, Sep 9, 2019 at 10:55 AM David Morin > wrote: > >> Thanks Alan, >> >> When you say "you just can't have two simultaneous deletes in the same >> partition", simultaneous means for the same transaction ? >> If a create 2 "t

Re: Locks with ACID: need some clarifications

2019-09-09 Thread David Morin
ive 3, where update and delete also take shared locks and > a first committer wins strategy is employed instead. > > Alan. > > On Mon, Sep 9, 2019 at 8:29 AM David Morin > wrote: > >> Hello, >> >> I use in production HDP 2.6.5 with Hive 2.1.0 >> We use t

Locks with ACID: need some clarifications

2019-09-09 Thread David Morin
Hello, I use in production HDP 2.6.5 with Hive 2.1.0 We use transactional tables and we try to ingest data in a streaming way (despite the fact we still use Hive 2) I've read some docs but I would like some clarifications concerning the use of Locks with transactional tables. Do we have to use

Re: Hive Major Compaction fails (cleaning step)

2019-08-26 Thread David Morin
, isn't it ? Thus, this is a workaround but a little bit crappy. But I'm open to any more suitable solution. Le lun. 26 août 2019 à 08:51, David Morin a écrit : > Sorry, the same link in english: > http://www.adaltas.com/en/2019/07/25/hive-3-features-tips-tricks/ > > Le lun. 26 août 2019 à

Re: Hive Major Compaction fails (cleaning step)

2019-08-26 Thread David Morin
Sorry, the same link in english: http://www.adaltas.com/en/2019/07/25/hive-3-features-tips-tricks/ Le lun. 26 août 2019 à 08:35, David Morin a écrit : > Here after a link related to hive3: > http://www.adaltas.com/fr/2019/07/25/hive-3-fonctionnalites-conseils-astuces/ > The author

Re: Hive Major Compaction fails (cleaning step)

2019-08-26 Thread David Morin
août 2019 à 07:51, David Morin a écrit : > Hello, > I've been trying "ALTER TABLE (table_name) COMPACT 'MAJOR'" on my Hive 2 > environment, but it always fails (HDP 2.6.5 precisely). It seems that the > merged base file is created but the delta is not deleted. > I

Hive Major Compaction fails (cleaning step)

2019-08-25 Thread David Morin
Hello, I've been trying "ALTER TABLE (table_name) COMPACT 'MAJOR'" on my Hive 2 environment, but it always fails (HDP 2.6.5 precisely). It seems that the merged base file is created but the delta is not deleted. I found that it was because the HiveMetastore Client can't connect to the metastore

Re: How to update Hive ACID tables in Flink

2019-03-12 Thread David Morin
Tue, Mar 12, 2019 at 12:24 PM David Morin > wrote: > >> Thanks Alan. >> Yes, the problem is fact was that this streaming API does not handle >> update and delete. >> I've used native Orc files and the next step I've planned to do is the >> use of ACID support

Re: How to update Hive ACID tables in Flink

2019-03-12 Thread David Morin
this case, though it only handles insert (not update), > so if you need updates you'd have to do the merge as you are currently > doing. > > Alan. > > On Mon, Mar 11, 2019 at 2:09 PM David Morin > wrote: > >> Hello, >> >> I've just implemented a pipeline ba

How to update Hive ACID tables in Flink

2019-03-11 Thread David Morin
Hello, I've just implemented a pipeline based on Apache Flink to synchronize data between MySQL and Hive (transactional + bucketized) onto HDP cluster. Flink jobs run on Yarn. I've used Orc files but without ACID properties. Then, we've created external tables on these hdfs directories that

Re: Read Hive ACID tables in Spark or Pig

2019-03-11 Thread David Morin
Hi, I've just implemented a pipeline to synchronize data between MySQL and Hive (transactional + bucketized) onto HDP cluster. I've used Orc files but without ACID properties. Then, we've created external tables on these hdfs directories that contain these delta Orc files. Then, MERGE INTO

Orc files in hdf: NullPointerException (RunLengthIntegerReaderV2)

2019-02-11 Thread David Morin
Hello, I face to one error when I try to read my Orc files from Hive (external table) or Pig or with hive --orcfiledump .. These files are generated with Flink using the Orc Java API with Vectorize column. If I create these files locally (/tmp/...), push them to hdfs, then I can read the content