Hi All.
We are happy to announce the availability of Apache ORC 1.9.3!
https://orc.apache.org/news/2024/03/20/ORC-1.9.3/
1.9.3 is a maintenance release containing important fixes.
It's available in Apache Downloads and Maven Central.
https://downloads.apache.org/orc/orc-1.9.3/
Hi All.
We are happy to announce the availability of Apache ORC C++ library
in the conan center, which is the home of the popular C++ package
manager:
https://conan.io/center/recipes/orc
Currently we have added 2.0.0, 1.9.2, 1.8.6 and 1.7.10. You may find
the recipe in the official conan
Hi All.
We are happy to announce the availability of Apache ORC 1.8.5!
https://orc.apache.org/news/2023/09/05/ORC-1.8.5/
1.8.5 is a maintenance release containing important fixes.
It's available in Apache Downloads and Maven Central.
https://downloads.apache.org/orc/orc-1.8.5/
Hi All.
We are happy to announce the availability of Apache ORC 1.7.9!
https://orc.apache.org/news/2023/05/07/ORC-1.7.9/
1.7.9 is a maintenance release containing important fixes.
It's available in Apache Downloads and Maven Central.
https://downloads.apache.org/orc/orc-1.7.9/
er to the java meta tool.
> Otherwise, it defaults to the end of the file.
>
>
>>
>> I would need to dig deeper in to how the writeMetadata() and
>> writeFileFooter() are different between Java / C++ in order to understand
>> what is going on. This Java WriterImpl.java code cau
y writes the offsets to the
> "_flush_length" file, though. I would also be interested in seeing when and
> how the preliminary footers are written, too.
>
> Thanks,
> Hinko
>
> From: Gang Wu
> Sent: Tuesday, March 28, 2023
Hi Hinko,
Please see my inline answers below:
On Tue, Mar 28, 2023 at 3:16 PM Hinko Kocevar wrote:
> Hi,
>
> I have a couple of questions about the persistence and consistency of the
> data when written to the file. In my use case I generally expect that the
> data rate is high enough such
Welcome Xin!
Best,
Gang
On Sat, Feb 11, 2023 at 1:07 PM William H. wrote:
> The Apache ORC PMC recently added Xin Zhang
> (https://github.com/coderex2522) as a committer.
>
> Please join me in welcoming Xin Zhang to the ORC community!
>
> Bests,
> William, Chair
>
[(byte
> offset of chunk1, decompressed size, # of values), (byte offset of
> chunk2, decompressed size, # of values)]. Is that correct?
>
> On Tue, Feb 7, 2023 at 1:04 PM Gang Wu wrote:
> >
> > Not exactly. It starts with the byte offset of the compression chunk and
> appe
Hi Gang,
>
> Thanks for your reply.
> A follow up question on Row Index, what is the exact meaning of
> 'position' in RowIndexEntry? Is it the byte offset of the starting
> position of the first compression chunk of that row group?
>
> On Thu, Feb 2, 2023 at 4:40 PM Gang Wu
ulting ColumnVectorBatch
> produced by C++ reader with PPD?
>
> On Thu, Jan 19, 2023 at 5:46 PM Xinyu Z wrote:
> >
> > Hi Gang,
> >
> > Thanks for your reply! It helps.
> >
> > Xinyu
> >
> > On Wed, Jan 18, 2023 at 10:42 AM Gang Wu wrote:
> >
Hi Xinyu,
The C++ library does not provide lazy materialization. The java library
supports row level filtering, please check it if interested:
https://issues.apache.org/jira/browse/ORC-577
With regards to the IO magnification introduced by PPD, I think we have
discussed this earlier and there is
ession block will
> be read and decompressed only once right?
>
> On Mon, Sep 5, 2022 at 11:31 PM Gang Wu wrote:
> >
> > Hi Xinyu,
> >
> > When the row group stride is set to 100, we end up with many row groups
> and each contributes a protobuf object in the
Hi Xinyu,
When the row group stride is set to 100, we end up with many row groups and
each contributes a protobuf object in the stripe index. That's why you see
the most expensive function is loadStripeIndex().
I need to say that smaller row groups may not help reduce the I/Os since
the
Hi David,
Unfortunately the C++ writer does not support it yet. It is pretty
straightforward to implement it. Do you like to contribute?
Best,
Gang
On Wed, Jul 7, 2021 at 4:39 PM David Justen
wrote:
> Hey everyone,
>
> I am currently working on a project using ORC's C++ library. To reduce
>
Hi Lei,
Unfortunately we don't have a Go binding for the ORC writer. I am not sure
if it is possible for you to use cgo package in Go to call C++ API in your
application?
Thanks,
Gang
On Tue, Mar 19, 2019 at 10:44 PM yanglei wrote:
> Dear Team
>
>
>
> I am working on a project using golang to
illion rows. But still one row group per stripe.
>
> -- Korru
>
>
> On Mar 11, 2019 6:00 PM, Gang Wu wrote:
>
> The default number of rows in a stripe is 1 which you can get from
> Reader::RowIndexStride(). You probably need to create more rows of data to
> verify t
The default number of rows in a stripe is 1 which you can get from
Reader::RowIndexStride(). You probably need to create more rows of data to
verify this.
Thanks
Gang
On Mon, Mar 11, 2019 at 1:19 PM Korry Douglas wrote:
> I’m making progress on predicate pushdown using the C++ ORC api.
>
>
The following function returns the stripe-level & row-group-level
statistics of the stripe specified by input.
*ORC_UNIQUE_PTR Reader::getStripeStatistics(uint64_t
stripeIndex) const;*
You need to call *StripeStatistics::getColumnStatistics *to get
stripe-level stats and
Yes, you are right. This interface returns column statistics of all columns
and their types can be found via type from the file footer..
On Fri, Mar 1, 2019 at 10:04 AM Korry Douglas wrote:
> I think I’ve figured this out - I have to look at the column type and then
> infer which of the
Unfortunately we don't have an API to return a row of data. You have to
extract each column from the batches.
For seekToRow(uint64_t rowNumber), you can jump to the row specified by
rowNumber and then use rowReader->next() to get the batch. It is pretty
straightforward.
You can actually create
To read the desired type of each column, you just need to cast the base
orc::ColumnVectorBatch, which you get from rowReader->next(), to its
desired type. You can dynamic_cast to orc::LongVectorBatch for int64 and
orc::StringVectorBatch for char *, check the API here:
Yes, you can find the example in https://orc.apache.org/docs/core-cpp.html
Calling orc::RowReader::next() will return the orc::ColumnVectorBatch data
which has a specific batch for each type. All the public APIs that you can
have is here:
Hi Zhiyuan,
Yes, you can see the following example which prints the names of the top
level fields.
orc::Reader * reader = ...
const orc::Type& type = reader->getType();
for (uint64_t i = 0; i != type.getSubtypeCount(); ++i)
{
std::cout << type.getFieldName(i) << std::endl;
}
Best,
Gang
On
Hi Freddy,
A simple answer is NO.
IIUC, ORC is designed to work on HDFS which is append-only. Therefore ORC
file is immutable once the writing is done.
Hive provides an advanced feature for update and delete with ORC:
https://orc.apache.org/docs/acid.html. Not sure if you are looking for
25 matches
Mail list logo