Re: Hive orc use case

Alan Gates Mon, 26 Sep 2016 11:21:13 -0700

As long as there is a spare worker thread this should be picked up within a few 
seconds.  It’s true you can’t force it to happen immediately if other 
compactions are happening, but that’s by design so that compaction work doesn’t 
take take too many resources.


Alan.

> On Sep 26, 2016, at 11:07, Mich Talebzadeh <mich.talebza...@gmail.com> wrote:
> 
> alter table payees compact 'minor';
> Compaction enqueued.
> OK
> 
> It queues compaction but there is no way I can force it to do compaction 
> immediately?
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  
> 
> On 26 September 2016 at 18:54, Alan Gates <alanfga...@gmail.com> wrote:
> alter table compact forces a compaction.  See 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionCompact
> 
> Alan.
> 
> > On Sep 26, 2016, at 10:41, Mich Talebzadeh <mich.talebza...@gmail.com> 
> > wrote:
> >
> > Can the temporary table be a solution to the original thread owner issue?
> >
> > Hive streaming for example from Flume to Hive is interesting but the issue 
> > is that one ends up with a fair bit of delta files due to transactional 
> > nature of ORC table and I know that Spark will not be able to open the 
> > table until compaction takes place which cannot be forced. I don't know 
> > where there is a way to enforce quick compaction..
> >
> > Thanks
> >
> > Dr Mich Talebzadeh
> >
> > LinkedIn  
> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >
> > http://talebzadehmich.wordpress.com
> >
> > Disclaimer: Use it at your own risk. Any and all responsibility for any 
> > loss, damage or destruction of data or any other property which may arise 
> > from relying on this email's technical content is explicitly disclaimed. 
> > The author will in no case be liable for any monetary damages arising from 
> > such loss, damage or destruction.
> >
> >
> > On 26 September 2016 at 17:41, Alan Gates <alanfga...@gmail.com> wrote:
> > ORC does not store data row by row.  It decomposes the rows into columns, 
> > and then stores pointer to those columns, as well as a number of indices 
> > and statistics, in a footer of the file.  Due to the footer, in the simple 
> > case you cannot read the file before you close it or append to it.  We did 
> > address both of these issues to support Hive streaming, but it’s a low 
> > level interface.  If you want to take a look at how Hive streaming handles 
> > this you could use it as your guide.  The starting point for that is 
> > HiveEndPoint in org.apache.hive.hcatalog.streaming.
> >
> > Alan.
> >
> > > On Sep 26, 2016, at 01:18, Amey Barve <ameybarv...@gmail.com> wrote:
> > >
> > > Hi All,
> > >
> > > I have an use case where I need to append either 1 or many rows to 
> > > orcFile as well as read 1 or many rows from it.
> > >
> > > I observed that I cannot read rows from OrcFile unless I close the 
> > > OrcFile's writer, is this correct?
> > >
> > > Why doesn't write actually flush the rows to the orcFile, is there any 
> > > alternative where I write the rows as well as read them without closing 
> > > the orcFile's writer ?
> > >
> > > Thanks and Regards,
> > > Amey
> >
> >
> 
>

Re: Hive orc use case

Reply via email to