Re: String stats requirements?

2017-06-06 Thread Owen O'Malley
On Tue, Jun 6, 2017 at 3:02 PM, Dain Sundstrom wrote: > Is it required that the StringStatistics min and max be the actual min and > max value for the column? I ask for two reasons, I’d like to be able to > “trim” values if the min or max is very large. Also, as a work around of > for the UTF-1

Re: "For dictionary encodings the dictionary is sorted"

2017-06-06 Thread Owen O'Malley
I'm confused. TimestampStatistics uses integers not strings. .. Owen On Mon, Jun 5, 2017 at 9:53 PM, Dain Sundstrom wrote: > > > On Dec 12, 2016, at 4:48 PM, Dain Sundstrom wrote: > > On Dec 12, 2016, at 4:36 PM, Owen O'Malley wrote: > >>> I think

Re: String stats requirements?

2017-06-06 Thread Owen O'Malley
"1 = HIVE-8732 fixed” after 0.14? If so I can update my > reader to detect this. > > -dain > > > On Jun 6, 2017, at 3:36 PM, Owen O'Malley > wrote: > > > > On Tue, Jun 6, 2017 at 3:02 PM, Dain Sundstrom wrote: > > > >> Is it required that

Re: What did ORC writer v3 fix?

2017-06-06 Thread Owen O'Malley
The writer version was bumped because it was major change to the writer. We were concerned that if there was a bug, that we would need to detect it. .. Owen > On Jun 6, 2017, at 19:24, Dain Sundstrom wrote: > > On reading the HIVE-12055 referenced from the docs for writer version 3, I’m > not

Re: Documentations issues

2017-06-19 Thread Owen O'Malley
Ok, I just put in a pull request for this: https://github.com/apache/orc/pull/133 Let me know if anything is still unclear. Thanks, Owen On Fri, Jun 16, 2017 at 12:19 PM, Dain Sundstrom wrote: > Recently I have been working on a custom writer for Presto and during this > I kept notes on s

[DRAFT][REPORT] Apache ORC - July 2017

2017-07-14 Thread Owen O'Malley
## Description: - A high-performance columnar file format for Hadoop workloads. ## Issues: - There are no issues requiring the board's attention. ## Activity: - A presentation on "ORC File - Optimizing Your Big Data" was given at the Dataworks Summit in San Jose. - Alibaba is contributing

Re: Upgrading ORC to hadoop-2.7.4

2017-07-21 Thread Owen O'Malley
It is a good idea to upgrade. We should probably configure the travis-ci so that one of the builds compiles against hadoop 2.6 to ensure that we continue to compile against it. .. Owen On Fri, Jul 21, 2017 at 3:17 PM, Gopal Vijayaraghavan wrote: > Hi, > > > > ORC files are currently 5% larger t

[DISCUSS] ORC 2.0

2017-08-04 Thread Owen O'Malley
All, We've started the process of updating the encodings for ORC. These changes are going to extend the format in ways that aren't forward compatible. (eg. The ORC 1.4 readers won't be able to read the new format.) The changes that I've heard about are: * Decimal encoding - this will like be sep

Re: [DISCUSS] ORC 2.0

2017-08-04 Thread Owen O'Malley
so, I agree we should do this. The list of potential benefits for > performance and space efficiency is compelling. And the long lag for users > with many old tools to upgrade will never get better. > > Alan. > > On Fri, Aug 4, 2017 at 9:29 AM, Owen O'Malley > wrote: &g

Re: [DISCUSS] ORC 2.0

2017-08-11 Thread Owen O'Malley
Ok, I created ORC-229 https://issues.apache.org/jira/browse/ORC-229 so that we'll have a new OrcFile.Version of UNSTABLE-PRE-2.0. If you look at the associated pull request, you can see the comments in the code are pretty clear that users should stay away. I also added a logged warning when the wri

Re: Bringing ACID into ORC

2017-09-13 Thread Owen O'Malley
I agree strongly with Alan. Have forked copies of the code is really error prone and causes lots of user confusion. It also means that each feature has to be added twice. ORC is a much more nimble project than Hive and we can make releases when needed. .. Owen

Re: Thoughts on Acid reader

2017-09-15 Thread Owen O'Malley
Yeah, I'd suggest adding to: OrcFile.ReaderOptions: exposeAcidRowId(boolean); -- so that the returned schema includes the ACID row id Reader.Options: setValidTransactions(TransactionList); -- apply transaction filtering Then it will read a single file (or range using Reader.Options.range(l

[DISCUSS] Making bug fix releases of 1.3 and 1.4

2017-10-11 Thread Owen O'Malley
All, We've fixed some important bugs and I'd like to make a release of 1.3 and 1.4. Does anyone have anything they'd like to include? .. Owen

Draft of Apache ORC board report

2017-10-11 Thread Owen O'Malley
All, Every three months our project needs to update the Apache Board with our current status. Please provide any feedback. .. Owen ## Description: - A high-performance columnar file format for Hadoop workloads. ## Issues: - There are no issues requiring the board's attention. ## Activity:

Re: [DISCUSS] Making bug fix releases of 1.3 and 1.4

2017-10-11 Thread Owen O'Malley
gjoon Hyun > > wrote: > > > >> It's a great news, especially 1.4.1! :) > >> > >> Bests, > >> Dongjoon. > >> > >> On 2017-10-11 08:39, "Owen O'Malley" wrote: > >> > All, > >> >We've fixed some important bugs and I'd like to make a release of > 1.3 > >> > and 1.4. Does anyone have anything they'd like to include? > >> > > >> > .. Owen > >> > > >> > > > > >

Re: [VOTE] Should we release ORC 1.4.1rc0?

2017-10-12 Thread Owen O'Malley
The KEYS file is in the dist svn directory: https://dist.apache.org/repos/dist/release/orc/KEYS Prasanth's key wasn't there, although I have it on my local keychain. I've gone ahead and added it to the KEYS file. Please verify the key Prasanth! .. Owen On Thu, Oct 12, 2017 at 12:29 PM, Alan G

Re: [VOTE] Should we release ORC 1.4.1rc0?

2017-10-12 Thread Owen O'Malley
that I should upload > the keys to dist. Will probably update the release doc to make sure the > KEYS are checkin in. I can confirm that the keys are correct. > > > Thanks and Regards, > Prasanth Jayachandran > > > On Thu, Oct 12, 2017 at 1:08 PM, Owen O'Malley &

Re: [VOTE] Should we release ORC 1.4.1rc0?

2017-10-12 Thread Owen O'Malley
t; > > the keys to dist. Will probably update the release doc to make sure > the > > > > KEYS are checkin in. I can confirm that the keys are correct. > > > > > > > > > > > > Thanks and Regards, > > > > Prasanth Jayachandran > > >

Re: [VOTE] Should we release ORC 1.3.4rc0?

2017-10-12 Thread Owen O'Malley
+1 I checked: * gpg signature * sha256 signature * build on 7 linux variants and MacOS * reviewed git log .. Owen On Thu, Oct 12, 2017 at 1:55 AM, Prasanth Jayachandran wrote: > All, > > Should we release the following artifacts as ORC 1.3.4? > Please refer jiras section for list of fixes that

Re: [VOTE] Should we release ORC 1.4.1rc0?

2017-10-17 Thread Owen O'Malley
Ok, I'll backport ORC-235, which excludes it from the transitive dependency. .. Owen On Tue, Oct 17, 2017 at 11:25 AM, Dongjoon Hyun wrote: > Ur, I checked Maven central today and found that ORC 1.4.1 seems to bring > unexpected dependency accidentally. This is not a feature issue. > > [INFO] +

Remove orc 1.2 from apache distribution mirrors

2017-10-19 Thread Owen O'Malley
All, Now that we've had 1.3 and 1.4 out for a while, I don't think there is any need for the 1.2 release to remain on the Apache distribution site and mirrors. If no one objects, I'll remove it at the end of the week. The bits remain available it just moves it from "stable" to "archived" in our

[PROPOSAL] Creating security list for ORC

2017-12-01 Thread Owen O'Malley
All, I think as we add column encryption in ORC-14, we'll need to start handling security issues. I think it would be good to create a security list and policy before we need it. Thoughts? .. Owen

Re: ORC magic

2017-12-15 Thread Owen O'Malley
On a side note, I put a patch in to the Linux 'file' command that makes it recognize ORC files. If you've got file 5.31 or later, you'll get: owen@laptop> file examples/*.orc examples/TestOrcFile.columnProjection.orc: Apache ORC examples/TestOrcFile.emptyFile.orc:

The linux clang build is currently broken

2017-12-18 Thread Owen O'Malley
Travis CI updated the clang compiler to use version 5.0 from 4.0. Because it added some new warnings, our builds on that variant are failing. I have been working on a fix and I have all of the warnings resolved, and now I'm down to a single test case failing. As a result of the failure, I'm also a

Re: getting read past EOF for Double column

2017-12-18 Thread Owen O'Malley
Actually, the metadata is reasonable, it is just that there is an array above that column that doesn't have any elements. So the tree down to column 36 looks like: column 0: (struct) count: 42692 column 1: data (struct) count: 42692 column 21: listingAssociated (array) count: 42692 column 22: (st

Re: getting read past EOF for Double column

2017-12-18 Thread Owen O'Malley
This is a bug. Please file a jira. It looks like a change went in that made the DoubleTreeReader fail if it is called on a batch of size 0. Thanks, Owen On Mon, Dec 18, 2017 at 10:19 AM, Owen O'Malley wrote: > Actually, the metadata is reasonable, it is just that there is an array

Re: getting read past EOF for Double column

2017-12-27 Thread Owen O'Malley
I've filed this as https://issues.apache.org/jira/browse/ORC-285 . Sorry for the delay in getting the fix out. .. Owen On Mon, Dec 18, 2017 at 10:27 AM, Owen O'Malley wrote: > This is a bug. Please file a jira. It looks like a change went in that > made the DoubleTreeRead

[REPORT] Apache ORC board report

2018-01-16 Thread Owen O'Malley
All, I posted to the board, but I wanted to make sure you guys saw it too. .. Owen ## Description: - A high-performance columnar file format for Hadoop workloads. ## Issues: - There are no issues requiring the board's attention. ## Activity: - We made bug fix releases for both the 1.3 and

[VOTE] Should we release ORC 1.4.2rc0?

2018-01-19 Thread Owen O'Malley
All, Should we release the following artifacts as ORC 1.4.2? tar: http://home.apache.org/~omalley/orc-1.4.2/ tag: https://github.com/apache/orc/releases/tag/release-1.4.2-rc0 Thanks!

Re: [VOTE] Should we release ORC 1.4.2rc0?

2018-01-23 Thread Owen O'Malley
t 12:32 AM, Prasanth Jayachandran < > j.prasant...@gmail.com> wrote: > > > +1 > > > > - Built from src > > - Verified signature, checksums and licenses from site report > > - Rat check > > - Ran unit tests > > > > > > > >

[VOTE] Should we release ORC 1.4.3rc0?

2018-02-06 Thread Owen O'Malley
All, There are some important bugs that have come up since 1.4.2. Therefore, I'd like to make a new release. Should we release the following artifacts as ORC 1.4.3? tar: http://home.apache.org/~omalley/orc-1.4.3/ tag: https://github.com/apache/orc/releases/tag/release-1.4.3rc0 jiras: https://i

Re: [VOTE] Should we release ORC 1.4.3rc0?

2018-02-09 Thread Owen O'Malley
With four +1's and no -1's the vote passes. Thanks for voting. I'll publish the release. .. Owen On Thu, Feb 8, 2018 at 2:57 PM, Gopal Vijayaraghavan wrote: > Hi, > > > Should we release the following artifacts as ORC 1.4.3? > > +1 - built tarball, ran tests > > and verified signatures. > >

Re: Including Apache ORC as a library

2018-02-12 Thread Owen O'Malley
I'm assuming that you are using the Java rather than C++ side of of the project. What you want is org.apache.orc:orc-core, which includes the protobuf class as org.apache.orc.OrcProto. That jar depends on org.apache.hive:hive-storage-api, which comes from Hive and defines the vectorized API. The

Re: ORC double encoding optimization proposal

2018-03-26 Thread Owen O'Malley
This is a really interesting conversation. Of course, the original use case for ORC was that you were never reading less than a stripe. So putting all of the data streams for a column back to back, which isn't in the spec, but should be, was optimal in terms of seeks. There are two cases that viol

Re: ORC double encoding optimization proposal

2018-03-27 Thread Owen O'Malley
Going back to the point of double split encoding, it would make sense to try a variant where we combine the sign and the mantissa. That should remove the sign stream at a relatively little cost of making the mantissa stream signed. Thinking more about the layout options... Another consideration i

Re: ORC double encoding optimization proposal

2018-03-30 Thread Owen O'Malley
On Wed, Mar 28, 2018 at 1:01 AM, Xiening Dai wrote: This modification will increase the complexity of implementation, and I am > not sure how much we will gain by not closing compression and rle chunks. > You probably have some data when you firstly designed row group and index. > Actually, I di

Re: [PROPOSAL] Creating security list for ORC

2018-04-10 Thread Owen O'Malley
I'd like to move forward on this. Any comments? On Fri, Dec 1, 2017 at 1:40 PM, Owen O'Malley wrote: > All, >I think as we add column encryption in ORC-14, we'll need to start > handling security issues. I think it would be good to create a security > list a

Re: [PROPOSAL] Creating security list for ORC

2018-04-11 Thread Owen O'Malley
Gates wrote: > > What's the benefit of having this separate from the private list? > > > > Alan. > > > > On Tue, Apr 10, 2018 at 11:15 AM, Owen O'Malley > > wrote: > > > >> I'd like to move forward on this. Any comments? > &g

Update the developing page on our site

2018-04-11 Thread Owen O'Malley
All, I updated the developer documentation as trying to help the new committers. Take a look and suggest improvements. https://orc.apache.org/develop/ Thanks, Owen

Re: [PROPOSAL] Creating security list for ORC

2018-04-13 Thread Owen O'Malley
gt; > On Wed, Apr 11, 2018 at 7:40 AM, Owen O'Malley > wrote: > > > The biggest advantage is that email to secur...@tlp.apache.org is > > automatically forwarded to secur...@apache.org allowing them to track > the > > issues. > > > > It also allows the pro

[REPORT] Apache board report for ORC

2018-04-13 Thread Owen O'Malley
Feedback welcome. .. Owen ## Description: - A high-performance columnar file format for Hadoop workloads. ## Issues: - There are no issues requiring the board's attention. ## Activity: - We made two bug fix releases for the 1.4 branch and will make another soon. - We also need to finalize

Re: [REPORT] Apache board report for ORC

2018-04-16 Thread Owen O'Malley
..."? > 2. Vertical whitespace inconsistencies: > - blank lines after headings starting with PMC changes, but not > before that > - two blank lines before u...@orc.apache.org > 3. Colon not needed at end of line for all 3 subscribers tallies > > -- Lefty >

Alternatives to JMH for the benchmarking code

2018-04-18 Thread Owen O'Malley
All, In my board report, I included that we removed the benchmarking code because of the dependency on JMH, which is unfortunately released under the GPL. Isabel, who is on the board, asked if there were alternatives to JMH that aren't GPL. In my looking, I didn't find any that integrate as

Re: Alternatives to JMH for the benchmarking code

2018-04-20 Thread Owen O'Malley
On Thu, Apr 19, 2018 at 1:20 PM, Dain Sundstrom wrote: > I don’t think there is anything like JMH, or any team that understands > Java micro benchmarking as well. Have you tried asking them to open source > the APIs under a better license so you can code against it. > That is a good suggestion.

Re: Alternatives to JMH for the benchmarking code

2018-04-27 Thread Owen O'Malley
Ok, after more discussions and a thread over on Apache's legal-discuss, it looks like we are generally fine for category X dependencies for optional features as long as we don't distribute binaries that include the feature. Since the benchmarks are non-user facing, it isn't a problem to not distrib

Re: Zstd decoder support

2018-05-04 Thread Owen O'Malley
I just upgraded ORC to use aircompressor 0.10. I assume we'll want to move to 0.11 before we use zstd? .. Owen On Fri, May 4, 2018 at 12:49 PM, Dain Sundstrom wrote: > The maintained location (and the version we use in prod) is: > https://github.com/airlift/aircompressor/tree/master/src/ > main

[VOTE] Should we release ORC 1.5.0rc0?

2018-05-07 Thread Owen O'Malley
All, It has been a while since ORC 1.4.0 was released. We have a lot of new features and I think we should release them. Should we release the following artifacts as ORC 1.5.0? tar: http://home.apache.org/~omalley/orc-1.5.0/ tag: https://github.com/apache/orc/releases/tag/release-1.5.0rc0 jira

[VOTE] Should we release ORC 1.4.4rc0?

2018-05-07 Thread Owen O'Malley
Although I just started the ORC 1.5.0 vote, we have some users that want a bug fix release for the ORC 1.4 line. Should we release the following artifacts as ORC 1.4.4? tar: http://home.apache.org/~omalley/orc-1.4.4/ tag: https://github.com/apache/orc/releases/tag/release-1.4.4rc0 jiras: https://

Re: [VOTE] Should we release ORC 1.4.4rc0?

2018-05-09 Thread Owen O'Malley
d help on this. > > > > On May 7, 2018, at 11:47 AM, Owen O'Malley > wrote: > > > > Although I just started the ORC 1.5.0 vote, we have some users that want > a > > bug fix release for the ORC 1.4 line. > > > > Should we release the following artifa

Re: [VOTE] Should we release ORC 1.5.0rc0?

2018-05-14 Thread Owen O'Malley
, Deepak Majeti > wrote: > > > > +1 > > > > - built from tar > > - checked checksum and signature > > - ran unit tests > > - ran rat checks > > > > On Mon, May 7, 2018 at 2:44 PM, Owen O'Malley > > wrote: > > > >> All,

Re: [VOTE] Should we release ORC 1.5.0rc0?

2018-05-14 Thread Owen O'Malley
With four +1's and no -1's, the vote passes. Thanks, everyone! On Mon, May 14, 2018 at 12:29 PM, Gopal Vijayaraghavan wrote: > > Hi, > > +1 > > Package builds clean & tested against HIVE-19465. > > Cheers, > Gopal > > > On 5/14/18, 9:54 AM, &qu

Re: [VOTE] Should we release ORC 1.4.4rc0?

2018-05-14 Thread Owen O'Malley
ests and rat check > > Thanks and Regards, > Prasanth Jayachandran > > > > On Thu, May 10, 2018 at 12:03 PM Deepak Majeti > wrote: > > > +1 > > > > - built from tar > > - checked checksum and signature > > - ran unit tests > > - ran rat

Re: Apache Orc doc links are broken

2018-05-18 Thread Owen O'Malley
Sorry, we were using an old version of jekyll and github-pages. When I updated the version yesterday, it broke stuff and I missed those. They should be fixed now. If you find something else broken, please let me know. .. Owen On Fri, May 18, 2018 at 4:15 PM, Xiening Dai wrote: > See https://or

[VOTE] Should we release ORC 1.5.1rc0?

2018-05-22 Thread Owen O'Malley
All, A couple of important bug fixes have come up with in the 1.5.0 release. Should we release the following artifacts as ORC 1.5.1? tar: http://home.apache.org/~omalley/orc-1.5.1/ tag: https://github.com/apache/orc/releases/tag/release-1.5.1rc0 jiras: https://issues.apache.org/jira/projects

[VOTE] Should we release ORC 1.5.1rc1?

2018-05-22 Thread Owen O'Malley
All, As Prasanth noticed, I forgot to remove the SNAPSHOT from the version. Should we release the following artifacts as ORC 1.5.1? tar: http://home.apache.org/~omalley/orc-1.5.1/ tag: https://github.com/apache/orc/releases/tag/release-1.5.1rc1 jiras: https://issues.apache.org/jira/projects/

Re: [VOTE] Should we release ORC 1.5.1rc1?

2018-05-22 Thread Owen O'Malley
rc file unit tests in hive > > Thanks and Regards, > Prasanth Jayachandran > > > > On Tue, May 22, 2018 at 4:32 PM Owen O'Malley > wrote: > > > All, > >As Prasanth noticed, I forgot to remove the SNAPSHOT from the version. > > Sho

Re: [VOTE] Should we release ORC 1.5.1rc1?

2018-05-25 Thread Owen O'Malley
< > > j.prasant...@gmail.com> wrote: > > > > > +1 > > > - verified signature, checksum > > > - built from source > > > - ran unit tests > > > - ran orc file unit tests in hive > > > > > > Thanks and Regards, > &g

Re: Building C++ API from source -- questions

2018-06-15 Thread Owen O'Malley
The 1.5 release does compile on windows, we just haven't updated the site yet. Sorry about that. It does work on Mac OS 10.13. I don't know about Debian 9. I know that the libhdfs needs a patch to compile on Ubuntu 18, but otherwise works. .. Owen > On Jun 14, 2018, at 22:39, Ellen Johnson w

Re: Arrow Support of Orc

2018-07-05 Thread Owen O'Malley
I think improved Arrow C++ integration would be great. I haven't looked at the current state of the work to see what could be better. I'd be against making Arrow the default C++ API, but changes to the API to make things faster for Arrow make sense. (Although as always, we need to worry about backw

[REPORT] Apache board report for ORC

2018-07-13 Thread Owen O'Malley
Here is my report to the Apache Board. Feedback welcome. .. Owen ## Description: - A high-performance columnar file format for Hadoop workloads. ## Issues: - There are no issues requiring the board's attention. ## Activity: - Released ORC 1.5.0 (and a few follow up bug fixes 1.5.1 and 1.5.2)

PGP keys for committers

2018-08-06 Thread Owen O'Malley
Hi all, If you are a committer on ORC, can you please update https://id.apache.org/ with your GPG fingerprint? That will ensure your key is automatically added to the project's GPG page - https://people.apache.org/keys/group/orc.asc Thanks, Owen

Move to Java 8

2018-09-01 Thread Owen O'Malley
Does anyone have any concerns about moving to Java 8? https://github.com/apache/orc/pull/305 .. Owen

Preparing for ORC 1.5.3

2018-09-17 Thread Owen O'Malley
All, I'm preparing a for a 1.5.3 release, what bug fixes should we include? My current list is: - ORC-382: Apache rat exclusions + add rat check to travis - ORC-383: Parallel builds fails with ConcurrentModificationException - ORC-384: [C++] fix memory leak when loading non-ORC files

Re: [Discussion] Base 128 variable integer encoding is not always good

2018-09-18 Thread Owen O'Malley
Gang, As you correctly point out, some columns don't work well with RLE. Unfortunately, without being able to look at the data it is hard for me to guess what the right compression strategies are. Based on your description, I would guess that the data doesn't have a lot of patterns to it and cov

Re: [Discussion] Base 128 variable integer encoding is not always good

2018-09-19 Thread Owen O'Malley
encoding > and compression settings. > > Gopal > DIRECT_V2 is RLEv2 which can alleviate the issue but not resolve it. I > will take a look at the orc.encoding.strategy setting. > > Thanks! > Gang > > On Tue, Sep 18, 2018 at 4:08 PM Owen O'Malley > wrote: &

[VOTE] Should we release ORC 1.5.3rc0?

2018-09-20 Thread Owen O'Malley
All, Should we release the following artifacts as ORC 1.5.3? tar: http://home.apache.org/~omalley/orc-1.5.3/ tag: https://github.com/apache/orc/releases/tag/release-1.5.3rc0 jiras: https://issues.apache.org/jira/browse/ORC/fixforversion/12344122 Thanks! Owen

[VOTE] Should we release ORC 1.5.3rc1?

2018-09-21 Thread Owen O'Malley
https://issues.apache.org/jira/browse/ORC/fixforversion/12344122 On Thu, Sep 20, 2018 at 4:25 PM Owen O'Malley wrote: > All, > > Should we release the following artifacts as ORC 1.5.3? > > tar: http://home.apache.org/~omalley/orc-1.5.3/ > tag: https://github.com/apache/orc/releases/tag/rel

[RESULT][VOTE] Should we release ORC 1.5.3rc1?

2018-09-25 Thread Owen O'Malley
release the following artifacts as ORC 1.5.3? >>> >>> tar: http://home.apache.org/~omalley/orc-1.5.3/ >>> tag: https://github.com/apache/orc/releases/tag/release-1.5.3rc1 >>> jiras: https://issues.apache.org/jira/browse/ORC/fixforversion/12344122 >>> &

[VOTE] Move our git repository to https://gitbox.apache.org/

2018-09-27 Thread Owen O'Malley
All, Apache projects can now move their git repositories to gitbox and Apache will grant the committers write access to the github repositories. That means we can use the github buttons to merge pull requests. :) I'm +1 for the move.

Re: [VOTE] Move our git repository to https://gitbox.apache.org/

2018-10-01 Thread Owen O'Malley
gt; > +1 That's cool! > > > > On Thu, Sep 27, 2018 at 7:18 AM Deepak Majeti > > wrote: > > > > > Definitely! +1 > > > > > > On Thu, Sep 27, 2018 at 8:11 AM Owen O'Malley > > > wrote: > > > > > > > All

Re: [VOTE] Move our git repository to https://gitbox.apache.org/

2018-10-01 Thread Owen O'Malley
The Apache INFRA jira is https://issues.apache.org/jira/browse/INFRA-17079 . On Mon, Oct 1, 2018 at 7:41 AM Owen O'Malley wrote: > With 5 +1's and no -1's the vote passes. > > Thanks! >Owen > > On Fri, Sep 28, 2018 at 11:49 AM Lefty Leverenz > wrote: &g

Re: Orc v2 Ideas

2018-10-01 Thread Owen O'Malley
On Fri, Sep 28, 2018 at 2:40 PM Xiening Dai wrote: > Hi all, > > While we are working on the new Orc v2 spec, I want to bounce some ideas > in this group. If we can get something concrete, I will open JIRAs to > follow up. Some of these ideas were mentioned before in various discussion, > but I j

Re: Orc v2 Ideas

2018-10-06 Thread Owen O'Malley
On Mon, Oct 1, 2018 at 3:56 PM Dain Sundstrom wrote: > > Interesting idea. This could help some processors of the data. Also, if > the format has this, it would be good to support "clustered" and "unique" > as flags for data that isn’t strictly sorted, but has all of the same > values clustered

Setting up the new github integration

2018-10-08 Thread Owen O'Malley
All, If you are a committer on ORC, you can setup your github account by using the "Link Github and ASF account" option on https://gitbox.apache.org/ . .. Owen

Re: Orc v2 Ideas

2018-10-09 Thread Owen O'Malley
> On Oct 8, 2018, at 5:19 PM, Xiening Dai wrote: > I think my point here is we want to be able to config some of the encoding > features. For example, right now LEB128 is enforced for all integers, but it > works bad with zstd. And the meta doesn’t have a way to turn it off. I’d put it differ

[REPORT] ORC Board Report

2018-10-09 Thread Owen O'Malley
Feedback is welcome... ## Description: - A high-performance columnar file format for Hadoop workloads. ## Issues: - There are no issues requiring the board's attention. ## Activity: - We made the 1.5.3 bug fix release. - There were two presentations about ORC at ApacheCon in Montreal: +

Forced push to master

2018-10-18 Thread Owen O'Malley
Hi all, Let's make sure to rebase commits before they are pushed to master. I hope it is ok, but I think it was better to clean up history now than leave the tangled history for ORC-419. Original SHA: cc7810edbd368186021f393ec5349ba0abd0752b New SHA: 786fd3e117c5dfb5767a9e226da14fb80a50ffca Gi

Fixing the docker builds

2018-10-29 Thread Owen O'Malley
All, We've let the docker build scripts decay badly. I'm fixing them, but along the way, I'd like to update the set of supported Linux versions. In particular, debian 7 "wheezy" doesn't have an openjdk 8, even in backports, which means that it is really hard for the java side of the build. I'd

Re: Usage of ORC_UNIQUE_PTR

2018-10-31 Thread Owen O'Malley
Ok, we should have done a better job at documenting these. For most of the features like ORC_UNIQUE_PTR and ORC_NOEXCEPT, we take two different approaches depending on the context: In external header files, we always use the ORC_*, because the users have to be able to include our header files int

Re: [orc] branch branch-1.5.4 deleted (was d8c8637)

2018-12-17 Thread Owen O'Malley
I deleted the branch-1.5.4, because we keep branches at the minor release level (release-X.Y), but not at the patch level (release-X.Y.Z). Users should take the latest release in the same minor release and only get bug fixes. .. Owen On Mon, Dec 17, 2018 at 7:23 AM wrote: > This is an automated

Re: [orc] 01/01: Preparing for release 1.5.4

2018-12-17 Thread Owen O'Malley
Before making a release for the first time, it is good to discuss what you are doing on the dev list. :) Are you trying to make a bug fix 1.5.4 or a new 1.6 release? It looks like you meant to make a bug fix release, so you should have started from branch-1.5 instead of master. I've moved your patc

Re: [DISCUSS] Releasing ORC-1.5.4

2018-12-17 Thread Owen O'Malley
Vaibhav, It would be great to make the release. Thanks, Owen On Mon, Dec 17, 2018 at 1:08 PM Eugene Koifman wrote: > +1 > > On 12/17/18, 10:27 AM, "Dongjoon Hyun" wrote: > > +1 for releasing ORC-1.5.4. > > I've been waiting for ORC-416 and ORC-419, too. > > Thanks, > Don

Re: [DISCUSS] Releasing ORC-1.5.4

2018-12-17 Thread Owen O'Malley
One other note is that the next release candidate should be release-1.5.4rc1, since you made a previous tag with the rc0 name. .. Owen On Mon, Dec 17, 2018 at 2:16 PM Owen O'Malley wrote: > Vaibhav, >It would be great to make the release. > > Thanks, >Owen > >

Re: [DISCUSS] Releasing ORC-1.5.4

2018-12-17 Thread Owen O'Malley
Ok, I also added ORC-418, which fixed the compilation on centos 6. I'm still hitting problems with compiling branch-1.5 on mac os, but I'm digging into it. .. Owen On Mon, Dec 17, 2018 at 2:28 PM Owen O'Malley wrote: > One other note is that the next release candidate s

Re: [DISCUSS] Releasing ORC-1.5.4

2018-12-17 Thread Owen O'Malley
ere aren't any more issues. .. Owen On Mon, Dec 17, 2018 at 3:29 PM Vaibhav Gumashta wrote: > Thanks for the update Owen. Will wait for ORC-418 before moving forward. > > --Vaibhav > > On 12/17/18, 3:12 PM, "Owen O'Malley" wrote: > > Ok, I also add

Re: [DISCUSS] Releasing ORC-1.5.4

2018-12-17 Thread Owen O'Malley
Ok, after adding a backport of ORC-432, the docker tests passed. Please go ahead and roll a release candidate and start a release vote. .. Owen On Mon, Dec 17, 2018 at 3:34 PM Owen O'Malley wrote: > I've already did the backport of ORC-418 and included it in branch-1.5. > The l

Re: [VOTE] Should we release ORC 1.5.4rc1?

2018-12-18 Thread Owen O'Malley
+1 I checked: Signatures are correct The tarball matches the tag. The build and unit tests pass on Mac. The build and unit tests pass in all of the docker environments. .. Owen > On Dec 18, 2018, at 10:15 AM, Vaibhav Gumashta > wrote: > > All, > > Should we release the following artifacts a

Re: WriterOptions.writerVersion(version)?

2019-03-01 Thread Owen O'Malley
The goal of WriterVersion is to record changes to the writer software so that the readers can cope with unknown bugs. It is not intended to mark format changes. A good example of this is when we switched from the row-by-row writer to the vectorized writer in HIVE-12055. This changed the implementat

[VOTE] Should we release ORC 1.5.5RC1?

2019-03-11 Thread Owen O'Malley
Should we release the following artifacts as ORC 1.5.5? tar: http://home.apache.org/~omalley/orc-1.5.5/ tag: https://github.com/apache/orc/releases/tag/release-1.5.5rc1 jiras: https://issue

Re: [VOTE] Should we release ORC 1.5.5RC1?

2019-03-14 Thread Owen O'Malley
With four +1's and no -1's the vote passes. I'll promote the release. Thanks for voting! .. Owen On Wed, Mar 13, 2019 at 8:30 PM Vaibhav Gumashta wrote: > +1 > > - Built from source and ran tests > - Verified checksum + signature > - Ran some hive queries > > Thanks, > --Vaibhav > > On 3/13/19,

Re: [DISCUSS][C++] Add Support For INT/BYTE vector batch

2019-04-01 Thread Owen O'Malley
>From the ORC library side, it isn't hard to support the additional vector types, although you'll need to make it API compatible for users that don't want it. For applications, I don't see a lot of advantages. For 1024 rows, the savings in memory between int64, int32, int16, and byte isn't that muc

Re: Re: [DISCUSS][C++] Add Support For INT/BYTE vector batch

2019-04-02 Thread Owen O'Malley
king. > > Thanks > Yurui > > from Alimail macOS <https://mail.alibaba-inc.com> > > -- > 发件人:Owen O'Malley > 日 期:2019年04月02日 01:02:02 > 收件人: > 主 题:Re: [DISCUSS][C++] Add Support For INT/BYTE vector bat

Re: Status of column level encryption

2019-04-02 Thread Owen O'Malley
Sorry for the silence. I have the work mostly done, and over the next couple weeks I'll be breaking it up in parts to get it committed to master. Then we can roll an ORC 1.6.0 with the column encryption in it. I do have some of the documentation done, but you can also look at my slides from the re

Re: Type length, scale, and precision?

2019-04-02 Thread Owen O'Malley
Sorry, I managed to miss this message. On Tue, Mar 19, 2019 at 9:31 PM Dain Sundstrom wrote: > For the types in the ORC footer, we have the following: > > // the maximum length of the type for varchar or char in UTF-8 characters > optional uint32 maximumLength = 4; > // the precision and scal

[REPORT] ORC Board Report

2019-04-12 Thread Owen O'Malley
Feedback is welcome... ## Description: - A high-performance columnar file format for Hadoop workloads. ## Issues: - There are no issues requiring the board's attention. ## Activity: - We released the bug fix release 1.5.5. - The column encryption work is nearing completion. - We gave a p

Review of the column encryption format changes

2019-04-23 Thread Owen O'Malley
All, Please take a look at the format changes for column encryption. https://github.com/apache/orc/pull/385 .. Owen

Re: Sub-classing implementation classes(e.g., ReaderImpl)

2019-05-29 Thread Owen O'Malley
Inheritance, especially in C++ is very hard to maintain compatibility across versions. Based on my experience on the Java side with LLAP, I’d suggest adding an optional cache manager than can be supplied to the reader. As Deepak says, there is already an interface for passing in the file tail.

Re: Sub-classing implementation classes(e.g., ReaderImpl)

2019-05-31 Thread Owen O'Malley
gt; proposal is approved. > > Thanks > Yurui > On 30 May 2019, 6:01 AM +0800, Owen O'Malley , > wrote: > > Inheritance, especially in C++ is very hard to maintain compatibility > across versions. > > > > Based on my experience on the Java side with LLAP, I’d suggest

Re: C++ API seekToRow() performance.

2019-06-02 Thread Owen O'Malley
> On Jun 2, 2019, at 5:43 AM, Gang Wu wrote: > > I can open a JIRA for the issue and port our fix back. That would be great. > > For the last suggestion, we can add the optimization as a writer option if > anyone is interested. It does significantly hurt compression to flush the streams ev

Re: The Orc magic string

2019-06-14 Thread Owen O'Malley
The hive acid format uses a side file that provides a sequence of the 8 byte file offsets for completed file footers. If the file is there, it passes the last offset to the reader and it will treat that as the end of the file. In the case where you don't have that, searching for the string “\00

  1   2   3   4   5   6   >