To add a one bit of context, we're looking at the handling of integers
other than INT32 and INT64 from the perspective of Apache Arrow. It
seems that in Parquet 1 files, you may not be able to recover the
original integer types from the file alone. The question is, should we
put this metadata in
y (Vertica) <stephen.walkaus...@hpe.com>
> Sent: Thursday, January 14, 2016 3:23 PM
> To: Sandryhaila, Aliaksei; dev@parquet.apache.org; Majeti, Deepak;
> non...@gmail.com; Wes McKinney
> Subject: Re: Parquet-cpp
>
> Yes, thanks for the introduction Julien.
>
> Nong and We
Wes
On Wed, Jan 27, 2016 at 10:22 PM, Wes McKinney <w...@cloudera.com> wrote:
> Yeah, if the Apache build queue is clogged up with other projects' builds,
> and you have a green build on your personal repo, I suggest posting that on
> the PR and the reviewer can accept the patch after che
o merge 1) very shortly but I'm going to be pretty busy in the
> coming
> week getting ready for Spark Summit. I probably won't have too much time to
> look at these until after but feel free to review and merge the patches. I
> can look
> after if that'd be helpful.
>
> On Wed
Dear friends,
I made a pass through the JIRAs and feature roadmap and listed out
essential tasks for reaching a milestone that would merit a versioned
code release, see:
https://docs.google.com/document/d/1WyquzupLc3UkErO2OhqLJNQ9a84Cccc8LVUSuLQz39o/edit#
I will be pressing for all of this to
hi folks,
Since there's so many moving pieces with creating a full-featured Parquet
reader-writer, I propose we start planning out a plan to create test
fixtures and tools to enable us to develop faster.
Specifically, we need to achieve maximum decoupling between functional
components. Every
builds are executing again.
On Tue, Jan 26, 2016 at 4:00 PM, Wes McKinney <w...@cloudera.com> wrote:
> There's 3 more patches outstanding that are causing blockage (418, 433,
> and 451/453), so I think if we get them merged today or tomorrow when we
> should be able to proceed wit
sure everything is on the same page.
> CC'ing some folks who should probably chime in.
>
>
> On Fri, Jan 29, 2016 at 10:21 AM, Wes McKinney <w...@cloudera.com> wrote:
>
> > hi folks,
> >
> > Since there's so many moving pieces with creating a full-featured Parquet
>
great doc Wes. Could add me as a commenter?
>
> On Sun, Jan 31, 2016 at 12:11 PM, Wes McKinney <w...@cloudera.com> wrote:
>
> > Dear all,
> >
> > I created a publicly available document where we can organize the
> > parquet-cpp roadmap and outstanding JIRAs
Feb 26, 2016 at 8:33 AM, Wes McKinney <w...@cloudera.com> wrote:
>
>> If someone could kindly merge this patch (PARQUET-494):
>>
>> https://github.com/apache/parquet-cpp/pull/64
>>
>> we'll then be able to close out the remaining JIRAs and hopefully tag
>&
unless there are any other major functional
requirements that we are missing?
- Wes
On Sat, Feb 20, 2016 at 8:40 AM, Wes McKinney <w...@cloudera.com> wrote:
> I'll be available most of today and tomorrow as needed for code
> reviews. We need to try to get the outstanding patch que
It should
> be a straightforward extension to FLBA with the additional requirement of
> swapping the bytes.
>
> On 02/23/2016 11:35 AM, Wes McKinney wrote:
> It looks like we should be able to clear our current patch queue today which
> puts us in very good shape for the 0.1 release.
&g
his point, the patches need to be reviewed and approved by
>>> Parquet committers in order to be committed to master.
>>>
>>> Unfortunately, there is not much activity on this side of the project.
>>> The lack of response from current committers is holding us b
;>>>>
>>>>>> I don't think any of those are that hard to demonstrate, but I'd be
>>>>>> uncomfortable not validating committers like we normally do.
>>>>>> Especially in this situation, where I could easily see the amount of
>>>&
this time next week, and spend a few more days on code tidying, adding
some example scripts and general use / hardening before we cut the
release. Sound good?
Thanks
Wes
On Tue, Feb 16, 2016 at 9:46 AM, Wes McKinney <w...@cloudera.com> wrote:
> Thanks all.
>
> I'm gonna try to
hello all,
We're close to being back in the patch queue red zone. Let me try to
make sense of what needs to reviewed and merged and in what order
(from merge first to merge last):
1. PARQUET-167: Nong needs to sign off and merge.
https://github.com/apache/parquet-cpp/pull/30
2. PARQUET-505: If
hi Andrew,
I can make some specific comments about parquet-cpp. Note that it's
still very much at an alpha stage of development, so you may need to
submit some patches for your needs, but such is the price of progress,
right? =) On the bright side, there's a number of us here on the list
who need
hi Uwe,
Thanks for bringing this up -- I haven't done any work with nested
data yet so I didn't add any helper functions like you're describing
yet!
I had several thoughts about this work working on the schema tree
building code in parquet/schema. One solution is that you can add a
parent()
Dear all,
Since the JIRA burndown started to stabilize, I tagged the first 0.1
release candidate of parquet-cpp:
https://github.com/apache/parquet-cpp/archive/release-0.1-rc0.tar.gz
SHA1: 9aefdbd6c14adc141d8cd7a1af681ef9e1c9e8f4
Thank you everyone for the patches and advice/support on this
hello,
responses inline
On Mon, Mar 7, 2016 at 8:22 AM, Aliaksei Sandryhaila
wrote:
> Hi Wes and Julien,
>
> At this point, parquet-cpp is heavily reliant on C++11 features and
> semantics. Believe it or not :), there are plenty of companies still
> running older versions
I'm sorry that I'm not able to join either due to international travel
(also due to European time zone), but my interests are much in line
with Uwe's and I look forward to continuing to work together with him
and Deepak and Aliaksei on parquet-cpp. We should engage in a
conversation on the ML
I'm sorry I wasn't able to join today again (traveling). We could
choose an early time Pacific time to make the meeting accessible to
both Asia and Europe -- I would suggest 8 or 9 AM Pacific
Thanks Julien -- is it possible to arrange for some advance notice of
the date and time of the sync up (or a shared google calendar
perhaps)?
On Thu, May 12, 2016 at 5:33 PM, Julien Le Dem wrote:
> The next sync up will be around Strata London early June, where I'll happen
>
I am fine with Doxygen style comments. I will make an effort to adopt
this style as well (especially when we set up auto-generated HTML API
documentation pages).
- Wes
On Wed, Apr 20, 2016 at 8:37 AM, Uwe Korn wrote:
> Hello,
>
> I would start to make some API documentation
;>>>> there around 10am
>>>>>> There will be people to open the door earlier.
>>>>>>
>>>>>> Agenda/things that have been mentioned on the thread:
>>>>>> - Parquet <-> Arrow
>>>>>> - Parquet-c
hi Jim
Cool to hear about this use case. My gut feeling is that we should not
expand the scope of the parquet-cpp library itself too much beyond the
computational details of constructing the encoded streams / metadata
and writing to a file stream or decoding a file into the raw values
stored in
I may be available a good portion of the 14th (I will be on the road)
and will try to participate remotely (can we set up a slack or
something?). I am most interested in strategies / algorithms around
batch conversion of nested data to and from Apache Arrow data
structures.
On Sat, Jul 2, 2016 at
Do we yet have a Slack / IRC for Parquet? I will be joining remotely
throughout the day. Anyone who is interested in algorithms for Arrow
nested data <-> Parquet disassembly/reassembly, we should start a
shared Google document to detail algorithms and various test cases
we'll need to address in
, too
>>
>> > Am 30.01.2017 um 04:15 schrieb Wes McKinney <wesmck...@gmail.com>:
>> >
>> > Does Monday 2/6 work? We could also do this coming Friday 2/3
>> >
>> >> On Sat, Jan 28, 2017 at 1:30 AM, Julien Le Dem <jul...@ledem.net>
&g
hi Pradeep -- you can use Thrift 0.7 or higher (the instructions say
"0.7+", perhaps we should call this out more explicitly). I recommend
building Thrift 0.9.3 or 0.10 -- let us know if you have issues with
these
Thanks
Wes
On Wed, Feb 8, 2017 at 2:19 PM, Pradeep Gollakota
project.
> I’m not against starting at 0.5 but we should try not to convey to much
> meaning in the version number related to the progress/increase in features.
>
> Julien
>
>> On Jan 24, 2017, at 6:42 AM, Wes McKinney <wesmck...@gmail.com> wrote:
>>
>> h
This falls during Spark Summit East -- not sure if anyone else has a
conflict with this
On Thu, Jan 26, 2017 at 7:02 PM, Julien Le Dem wrote:
> Next parquet sync will happen Thursday February 9th at 10am PT on google
> hangout
>
e:106: recipe for target
> 'release/parquet-dump-schema' failed
> make[2]: *** [release/parquet-dump-schema] Error 1
> CMakeFiles/Makefile2:716: recipe for target
> 'tools/CMakeFiles/parquet-dump-schema.dir/all' failed
> make[1]: *** [tools/CMakeFiles/parquet-dump-schema.dir/all] Error
>>
>> I actually had 2 versions of boost, I built 1.54 once for a different
>> project (which I do not need anymore), I got rid of it and now it picks up
>> 1.58 but still gives the same error. Will upload a complete shell
>> transcript.
>>
>> Regards,
>> Keit
@Uwe, I suggest we prefix the RC directory names with apache-parquet-cpp- in
https://dist.apache.org/repos/dist/dev/parquet/
to help disambiguate the RCs of the different subcomponents.
On Ubuntu 14.04:
- Debug build and ran tests with valgrind --tool=memcheck with gcc 4.8.5
- Release build
Moving this to the Parquet mailing list. Other days of the week work
OK for me generally.
On Fri, Feb 24, 2017 at 5:48 PM, Julien Le Dem wrote:
> Currently the Parquet sync-up is scheduled on Thursday 10 am PT every other
> week.
> Marcel mentioned that another day (same time)
Dear Apache Kudu and Apache Impala (incubating) communities,
(I'm not sure the best way to have a cross-list discussion, so I
apologize if this does not work well)
On the recent Apache Parquet sync call, we discussed C++ code sharing
between the codebases in Apache Arrow and Apache Parquet, and
value
collect2: error: ld returned 1 exit status
Patch forthcoming
On Wed, Feb 22, 2017 at 1:29 PM, Keith Chapman <keithgchap...@gmail.com> wrote:
> Hi Wes,
>
> No I don't have SNAPPY_HOME set. Yes this seems similar to 885
>
> On Feb 22, 2017 10:25 AM, "Wes McKinney&qu
st::match_results<__gnu_cxx::__normal_iterator const*, std::string>,
> std::allocator<boost::sub_match<__gnu_cxx::__normal_iterator std::string> > > > const&)'
> ../release/libparquet.a(metadata.cc.o): In function `perl_matcher':
> .
> And a lot more
>
n.com
>
> On Wed, Feb 22, 2017 at 10:30 AM, Wes McKinney <wesmck...@gmail.com> wrote:
>>
>> I'm able to reproduce the issue on Ubuntu 14.04
>>
>> Linking CXX shared library debug/libparquet.so
>> /usr/bin/ld: /usr/lib/libsnappy.a(snappy.o): relocation R_X8
+1 (binding)
- Verified signature
- Built with -DPARQUET_ARROW=on and ran unit tests
- Wes
On Sun, Feb 19, 2017 at 1:01 PM, Uwe L. Korn wrote:
> Small amendment to the previous mail:
>
> The vote will be open for the next ~72 hours ending at 18:45 CET,
> February 22, 2017.
>
hi folks,
Since Uwe has set up the release-making bits recently, and the API is
reasonably stable after the refactor to depend on libarrow, I propose
we go ahead and make a first official parquet-cpp source release.
I propose that we call this release 0.5.0 instead of 0.1.0 to reflect
the
’t run tests.
>
> rb
>
> On Sun, Feb 26, 2017 at 11:10 AM, Wes McKinney wesmck...@gmail.com wrote:
>
> hi Deepak,
>
> Thank you very much for catching this.
>
> It appears that Travis CI silently upgraded our build image to Xcode
> 7.3 last fall — we should have pegge
on Ubuntu 16.04 with GCC 4.9.4
>
> +1 (non-binding)
>
> Thanks, Uwe.
>
>
> On Fri, Feb 24, 2017 at 5:21 PM, Wes McKinney <wesmck...@gmail.com> wrote:
>
> > @Uwe, I suggest we prefix the RC directory names with apache-parquet-cpp-
> > in
> >
>
e opinions of others, and possible next steps.
Thanks
Wes
On Sun, Feb 26, 2017 at 2:12 AM, Henry Robinson <he...@apache.org> wrote:
> Thanks for bringing this up, Wes.
>
> On 25 February 2017 at 14:18, Wes McKinney <wesmck...@gmail.com> wrote:
>
>> Dear Apache Kudu an
sharing code, we should figure out how exactly we'll manage the
>> cases where we want to make some change in a common library that breaks an
>> API used by other projects, given there's no way to make an atomic commit
>> across many repositories. One option is that each "user&q
.9.4 and 5.4.0 on OSX.
> Looks like the option '-stdlib=libc++' works only with Clang.
>
> On Sun, Feb 26, 2017 at 9:34 AM, Wes McKinney <wesmck...@gmail.com> wrote:
>
>> @Deepak: which version of XCode is the clang 3.6.0 from? I'd like to look
>> into it
>>
>
ome (most) of it be added to APR <https://apr.apache.org/>?
>
> On Sun, Feb 26, 2017 at 8:12 PM, Wes McKinney <wesmck...@gmail.com> wrote:
>
>> hi Henry,
>>
>> Thank you for these comments.
>>
>> I think having a kind of "Apache Commons for [Modern] C++&quo
hi Julien,
I'm very sorry about the inconvenience with this and the delay in
getting it sorted out. I will triage this evening by disabling the
Parquet tests in Arrow until we get the current problems under
control. When we re-enable the Parquet tests in Travis CI I agree we
should pin the
he other way around.
>> Arrow provides the API and each storage layer (Parquet, Kudu, Cassandra,
>> ...) provides a way to produce Arrow Record Batches.
>> thoughts?
>>
>>> On Tue, Sep 6, 2016 at 3:37 PM, Wes McKinney <wesmck...@gmail.com> wrote:
>>&g
I don't agree with this approach right now. Here are my reasons:
1. The Parquet Python integration will need to depend both on PyArrow
and the Arrow C++ libraries, so these libraries would generally need
to be developed together
2. PyArrow would need to define and maintain a C++ or Cython API so
for me. I will then to continue to implement the missing
> interfaces for Parquet in pyarrow.parquet.
>
> @wesm Can you take care that we easily depend on a pinned version of
> parquet-cpp in pyarrow’s travis builds?
>
> Uwe
>
>> Am 21.09.2016 um 20:07 schrieb Wes McKinney
+1
On Thu, Sep 22, 2016 at 8:18 PM, Julien Le Dem wrote:
> The sync next week collides with strata Conf in NY.
> I propose to move it to the following week.
>
>
> --
> Julien
The type of comparison used here strikes me as dependent on the
ConvertedType of the column. Adding explicit signed/unsigned min/max
of course gives you both options after the fact. So another option is
(if I'm understanding correctly) to change parquet-mr's BYTE_ARRAY
comparison used for UTF8
Since googlecode project hosting seems to have completely shut down
(they had claimed that these downloads would be available the "rest of
2016"), you can use the download links from GitHub:
https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.bz2
cf
I think we are ready to make a release once PARQUET-702 is merged. Is
there any more licensing / NOTICE review work to do?
On Fri, Nov 4, 2016 at 10:29 AM, Deepak Majeti wrote:
> I would like to get PARQUET-764 and PARQUET-702 into the release as
> well. Both of them
Same. Thanks
On Fri, Oct 28, 2016 at 2:36 PM, Deepak Majeti wrote:
> Julien,
>
> Can you please add me to the calendar invite for the sync-up meetings ?
> Thanks.
>
> On Thu, Oct 27, 2016 at 2:33 PM, Julien Le Dem wrote:
>> Attendees/Agenda
>> Julien
gt;>> The parquet-cpp repo has reached a stable state and should release soon.
>>> Integration with arrow-cpp is now in the parquet-cpp repo.
>>>
>>> ## Health report:
>>> The PMC and committer list are growing. Discussion is happening on the
>>
hi James,
You have to pass "-fPIC" in your $CXXFLAGS when you are building
Thrift. See how we have things set up in our external project
https://github.com/apache/parquet-cpp/blob/master/cmake_modules/ThirdpartyToolchain.cmake#L54
As an example for Thrift 0.9.3 (which uses CMake now instead of
hi Keith,
It seems perfect reasonable to add configurable read buffering, or an
option to buffer the entire row group if your environment permits it.
Can you create a JIRA about this? We would welcome contributions
around IO tuning for different hardware / network environments.
Note that in
hi folks,
Spurred by the discussion and bugfix for PARQUET-799, I'd like to do
something about the IO interfaces that we currently have implemented
in parquet-cpp.
For C++ at least, the Parquet project is not an ideal place to be
maintaining cross-platform IO and memory management. There are
These are available now (Thanks Uwe!):
conda install parquet-cpp -c conda-forge
Support for date and time types is incomplete in the Arrow adapter --
after Arrow 0.3 comes out we'll want to push for a 1.1.0 release
including more complete support.
If anyone reading has the skills and time to
/master/be/src/exec/hdfs-parquet-scanner.h#L78
On Thu, Mar 16, 2017 at 3:51 PM, Wes McKinney <wesmck...@gmail.com> wrote:
> hi Grant,
>
> The value [1, 2, 3] is only 1 value, not 3. The "Number of rows"
> passed to the row group is with respect to top level records, *not
hi Grant,
The value [1, 2, 3] is only 1 value, not 3. The "Number of rows"
passed to the row group is with respect to top level records, *not*
counting repeated fields.
>From https://blog.twitter.com/2013/dremel-made-simple-with-parquet, I
believe the correct data to write is:
rep level | def
hing
>> looks good.
>>
>> On Mon, Mar 13, 2017 at 3:10 PM, Wes McKinney <wesmck...@gmail.com> wrote:
>>
>> > This is in the README
>> >
>> > "The test suite relies on an environment variable PARQUET_TEST_DATA
>> > pointing to the
hi Uwe,
Thanks for bringing this up.
I have a somewhat different opinion, which is that I don't think
categorical metadata belongs _formally_ in the Parquet format. The
reason is that database systems generally address storage of
categorical data using fact and dimension tables -- if you store
usy on Mondays and Tuesdays, the rest of the week is fine by me.
>> >> >>
>> >> >> Zoltan
>> >> >>
>> >> >> On Mon, Feb 27, 2017 at 8:28 AM Uwe L. Korn <uw...@xhochy.com>
>> wrote:
>> >> >>
>> >> >&g
6c9afc/src/parquet/column/writer.cc#L337
> Now the Parquet Writer destructor tries to write close the file and
> encounters https://github.com/apache/parquet-cpp/blob/5e59bc5c6491a7505
> 585c08fd62aa52f9a6c9afc/src/parquet/column/writer.cc#L159
>
>
> On Mon, Mar 1
> I’m doing wrong, my vote is +0.
>
> rb
>
>
> On Mon, Mar 13, 2017 at 2:32 PM, Ryan Blue <rb...@netflix.com> wrote:
>
>> Will do, sorry for the delay.
>>
>> On Mon, Mar 13, 2017 at 2:31 PM, Wes McKinney <wesmck...@gmail.com> wrote:
>>
>&g
hi Uwe,
Thank you for making the release candidate.
I have
* Built and run the unit tests (Ubuntu 14.04, gcc 4.8.5)
* Verified the MD5 signature
* Verified the GPG signature
My vote: +1 (binding)
@Ryan or @Julien, since we're running a bit short on the voting window
would you mind taking a
hi Grant,
the exception is coming from
if (num_rows_ != expected_rows_) {
throw ParquetException(
"Less than the number of expected rows written in"
" the current column chunk");
}
See https://issues.apache.org/jira/browse/PARQUET-914
On Mon, Mar 13, 2017 at 6:01 PM, Wes McKinney <wesmck...@gmail.com> wrote:
> hi Grant,
>
> the exception is coming from
>
> if (num_rows_ != expected_rows_) {
> throw ParquetException(
> "Less
hi Keith -- we have focused so far on columnar reads (i.e. Arrow) vs.
row/record reads. We would welcome contributions to add a record
reader interface
Thanks
Wes
On Tue, Apr 4, 2017 at 8:21 PM, Keith Chapman wrote:
> Hi,
>
> I'm trying to read a parquet file which has
+1. In doing so we may want to rename the repository to apache/parquet
to reflect the expanded scope.
We could also discuss merging in the C++ implementation, though the
main reservation I would have would be version numbers as we will
likely be releasing parquet-cpp more frequently than
the
>> > Java
>> > and C++ interoperability. Currently, Java treats parquet files written by
>> > C++ differently.
>> >
>> > On Wed, Aug 2, 2017 at 7:59 PM, Wes McKinney <wesmck...@gmail.com>
>> wrote:
>> >
>> > > +1. In do
hi Joerg,
Our developer community did not author that tutorial -- I recommend
following the documentation in the Arrow and Parquet codebases; if the
documentation is inaccurate or incomplete, we should work together to
improve it.
It looks like you may have -DPARQUET_BOOST_USE_SHARED=OFF set
hi Anna -- I just added you to the Contributor list (your apache.org
login), so you should be able to assign issues now.
- Wes
On Fri, Jul 14, 2017 at 6:36 PM, Anna Szonyi wrote:
> Hi,
>
> Could I get access to the parquet project Jira? I'd like to assign a few
> newbie
hi Mike,
You can use
import pyarrow.parquet as pq
pf = pq.ParquetFile(path)
pf.metadata
or
pf.schema
This does not read the whole file, only the metadata. Note that we
have a function write_metadata:
https://github.com/apache/arrow/blob/master/python/pyarrow/parquet.py#L777
It would be nice
This is not easy to do right now while the file is being read (rather,
ex-post), but you are welcome to look at extending the Parquet read
API to support selecting a particular row subset.
- Wes
On Tue, Jul 25, 2017 at 4:10 PM, Katelman, Michael
wrote:
>
adata had through pq.ParquetFile(path).metadata (or
> .schema) include user metadata? I only see num rows, num row groups, column
> names and types. Maybe I'm not looking in the right place.
>
> -Mike
>
> -Original Message-
> From: Wes McKinney [mailto:wesmck...@gmail.com]
metadata. I should be able to add it myself as well.
>
> -Mike
>
> -Original Message-
> From: Wes McKinney [mailto:wesmck...@gmail.com]
> Sent: Wednesday, July 26, 2017 9:34
> To: dev@parquet.apache.org
> Subject: Re: metadata reading
>
> The Arrow use
s on more datatypes? Would adding more compressible physical_types
> even be useful?
>
> Felipe
>
>
>
> On Wed, Jul 26, 2017 at 1:31 PM, Wes McKinney <wesmck...@gmail.com> wrote:
>
>> We are using std::copy to cast the values on the write side (from
>>
hi Felipe,
In C++ it is the equivalent of
uint64_t val = ...;
int64_t encoded_val = *reinterpret_cast();
So no alteration of the bit pattern
- Wes
On Wed, Jul 26, 2017 at 12:18 PM, Felipe Aramburu wrote:
>
-- Forwarded message --
From: Chris Lambertus
Date: Fri, Apr 28, 2017 at 3:22 PM
Subject: Github's disappearing mirrors
To: committers
Hello committers,
We have received quite a few reports of github mirrors gone missing. We’ve
tracked
hi Mike
No, it's a TODO: https://issues.apache.org/jira/browse/PARQUET-929
- Wes
On Sun, Jul 30, 2017 at 11:00 AM, Katelman, Michael
wrote:
> Hi,
>
> I was trying to write out a really long column of strings where it makes
> sense to use a dictionary
hi Joerg,
It sounds like you are referring to the record-based writer API that's
found in parquet-mr, which was originally designed for use in Hadoop
MapReduce (if I understand correctly).
There is no requirement to write Parquet files in this fashion. The
Parquet C++ writer and reader API
+1 (binding)
* Verified signature
* Build from minimal env and run unit tests on Linux (Ubuntu 14.04),
built against Arrow 0.4.0 RC0 and ran Python unit tests
* Built RC with Visual Studio 2015 against Apache Arrow 0.4.0 rc0,
built Python extension and ran unit tests. The Visual Studio build is
hi Mike,
I think you want to use WriteBatch on TypedColumnWriter:
https://github.com/apache/parquet-cpp/blob/master/src/parquet/column/writer.h#L166
For a flat table with an optional repetition type, the definition
levels are a sequence of 1's and 0's, where 1 is for non-null values.
The array
Seems that function could use some documentation. It is not intended to be
able to clear bits, but rather to set a bit to 1 only if is_set is true.
Another way would be
if (is_set) {
bits[i / 8] |= 1 << (i % 8);
}
In theory the branch-free version may be faster, but I have not run any
sted using
> `./dev/release/verify-release-candidate 1.1.0 0` on macOS
>
> I had to clean up an old version of arrow in /usr/local :
>
> /usr/local//lib/pkgconfig/arrow.pc
>
> /usr/local/include/arrow
>
>
>
> On Wed, May 17, 2017 at 10:03 PM, Wes McKinney <wesmck.
st on macos
> >
> > On Fri, May 19, 2017 at 9:15 AM, Ryan Blue <rb...@netflix.com.invalid>
> > wrote:
> >
> > > +1 (binding)
> > >
> > > * Checked signatures, checksums
> > > * Built on Ubuntu 16.04 LTS
> > > * Ran unit tests
+1 (binding)
* Verified the blocker PARQUET-995 has been fixed
* Ran unit tests on Linux + Arrow/Python integration
* Ran unit tests on Windows/Visual Studio 2015
On Thu, May 18, 2017 at 4:09 PM, Uwe L. Korn wrote:
> +1 (binding)
>
> Build tests on Linux and macOS and verified
hi Vaishal,
I already replied to you about this on the mailing list on June 1, can
you reply to that thread?
I see that you opened ARROW-1097 about the tensor issue. If you could
add a standalone reproduction of the problem that would help us debug
it and fix faster
Thanks
Wes
On Wed, Jun 7,
hi Vaishal,
You can certainly use NumPy arrays to create Parquet files, but you
will have to do a bit of work to adapt the NumPy arrays to Parquet's
(and Arrow's) columnar data model. pandas DataFrame contains NumPy
arrays internally.
import pyarrow as pa
import pyarrow.parquet as pq
import
hi Felipe,
Yes, that's right. For primitive types it is typical for the
LogicalType to be not set in the Thrift metadata. The particular
integer logical types were added relatively late to the Parquet format
and are not used in all implementations (for example, some databases
like Hive and Impala
hi Young,
It looks like your Boost was compiled with a different version of gcc. If
you're targeting gcc 4.8 you need to compile all the dependencies with the
same compiler, otherwise you will have a conflict with the libstdc++ ABI.
Redhat provides the devtoolset which helps with deploying on a
I would like to have MSVC / Windows support in 1.1.0, I will add a blocker.
If it isn't done by next week sometime we can move forward with the RC.
On Wed, May 3, 2017 at 2:21 PM Uwe L. Korn wrote:
> Hello Parquet devs,
>
> as Apache Arrow 1.1.0 comes close to a release, it is
I opened https://issues.apache.org/jira/browse/PARQUET-1021 about
adding a more helpful failure message, though you would need to return
with ctest -VV in order to see any error output (unless you run the
unit test executables directly)
On Tue, Jun 6, 2017 at 2:22 PM, Artem
hi all,
We have one last patch PARQUET-1037 pending, but the only other thing
in progress is support for Arrow decimal read/write. While I would
like to see decimal support go into 1.3.0, there are a number of open
questions being discussed on the Arrow mailing list:
hi Rahul,
the key value metadata is only supported at the file/schema level and
at the column chunk (i.e. each column in a row group) level:
https://github.com/apache/parquet-cpp/blob/master/src/parquet/parquet.thrift#L530
We should add an accessor for the column chunk key-value metadata to
1 - 100 of 1664 matches
Mail list logo