Re: [VOTE] Migration of parquet-cpp issues to Arrow's issue tracker

2024-06-03 Thread Julien Le Dem
A bit late but just for the record +1 from me

On Mon, Jun 3, 2024 at 17:23 Rok Mihevc  wrote:

> Thanks all for voting. I tallied the votes (assuming simple +1 votes were
> meant as +1 Parquet, +1 Arrow) and the vote succeeded with the following
> results:
>
> Parquet:
> 3x +1 binding (Gang Wu, Antoine Pitrou, Wes McKinney)
> 9x +1 non-binding (Micah Kornfield, Felipe Oliveira Carvalho, Fokko
> Driesprong, Alenka Frim, Andy Grove, Raúl Cumplido, Sutou Kouhei, Jiashen
> Zhang, Rok Mihevc)
>
> Arrow:
> 6x +1 binding (Micah Kornfield, Antoine Pitrou, Andy Grove, Raúl Cumplido,
> Wes McKinney, Sutou Kouhei)
> 6x +1 non-binding (Felipe Oliveira Carvalho, Fokko Driesprong, Gang Wu,
> Alenka Frim, Jiashen Zhang, Rok Mihevc)
>
> I'm not sure about formalities here, but perhaps one PMC per project could
> confirm my count?
>
> I'll start making preparations for the move and hopefully execute it
> later this week.
>
> Best,
> Rok
>
> On Tue, Jun 4, 2024 at 1:55 AM Rok Mihevc  wrote:
>
> > +1 (non-binding)
> >
> > On Thu, May 30, 2024 at 6:13 PM Jiashen Zhang 
> > wrote:
> >
> >> +1 (non-binding)
> >>
> >> On Wed, May 29, 2024 at 3:29 PM Sutou Kouhei 
> wrote:
> >>
> >> > +1 (binding for Arrow)
> >> >
> >> > In <
> cag6ackwdjv09oab2k+cxzx8gika6rhrstgwadwgos9zast0...@mail.gmail.com>
> >> >   "[VOTE] Migration of parquet-cpp issues to Arrow's issue tracker" on
> >> > Wed, 29 May 2024 16:14:44 +0200,
> >> >   Rok Mihevc  wrote:
> >> >
> >> > > # sending this to both dev@arrow and dev@parquet
> >> > >
> >> > > Hi all,
> >> > >
> >> > > Following the ML discussion [1] I would like to propose a vote for
> >> > > parquet-cpp issues to be moved from Parquet Jira [2] to Arrow's
> issue
> >> > > tracker [3].
> >> > >
> >> > > [1]
> https://lists.apache.org/thread/zklp0lwcbcsdzgxoxy6wqjwrvt6y4s9p
> >> > > [2] https://issues.apache.org/jira/projects/PARQUET/issues/
> >> > > [3] https://github.com/apache/arrow/issues/
> >> > >
> >> > > The vote will be open for at least 72 hours.
> >> > >
> >> > > [ ] +1 Migrate parquet-cpp issues
> >> > > [ ] +0
> >> > > [ ] -1 Do not migrate parquet-cpp issues because...
> >> > >
> >> > >
> >> > > Rok
> >> >
> >>
> >>
> >> --
> >> Thanks,
> >> Jiashen
> >>
> >
>


Re: [DRAFT] Apache Arrow board report October 2018

2018-10-15 Thread Julien Le Dem
What's the plan for the parquet-cpp repo now that ARROW-3075 has been
merged?


On Thu, Oct 11, 2018 at 9:12 AM Wes McKinney  wrote:

> OK, I have updated. If others could comment on the .NET thread, we can
> start a vote soon there
>
> ## Description:
>
> Apache Arrow is a cross-language development platform for in-memory data.
> It
> specifies a standardized language-independent columnar memory format for
> flat
> and hierarchical data, organized for efficient analytic operations on
> modern
> hardware. It also provides computational libraries and zero-copy streaming
> messaging and interprocess communication. Languages currently supported
> include
> C, C++, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust.
>
> ## Issues:
>
> - There are no issues requiring board attention at this time
>
> ## Activity:
>
> - The Arrow and Parquet communities resolved by vote to merge their
> respective
>   C++ codebases in the Apache Arrow repository. This work was completed
> this
>   quarter
> - The project received two code donations via IP clearance: a GLib
> interface to
>   the Parquet C++ libraries, and the Gandiva LLVM vectorized Arrow
> expression
>   compiler
> - Work has commenced on R language integration with the C++ libraries
> - An initial MATLAB binding to the C++ libraries was contributed
> - The community is discussing receiving a proposed native implementation of
>   Arrow in C# .NET
>
> ## Health report:
> - The project is very healthy, though rapid user and contributor growth has
>   stressed the limits of our developer tooling and put a great deal of
> burden
>   on the active project maintainers
>
> ## PMC changes:
>
>  - Currently 24 PMC members.
>  - Antoine Pitrou was added to the PMC on Mon Aug 20 2018
>
> ## Committer base changes:
>
>  - Currently 33 committers.
>  - New commmitters:
> - Andrew Grove was added as a committer on Tue Aug 07 2018
> - Krisztian Szucs was added as a committer on Thu Aug 16 2018
>
> ## Releases:
>
>  - 0.10.0 was released on Sun Aug 05 2018
>  - 0.11.0 was released on Sun Oct 07 2018
>
> ## JIRA activity:
>
>  - 649 JIRA tickets created in the last 3 months
>  - 476 JIRA tickets closed/resolved in the last 3 months
> On Thu, Oct 11, 2018 at 12:08 PM Uwe L. Korn  wrote:
> >
> > You could also mention that we are about to receive a C# donation.
> Otherwise this looks good.
> >
> > Uwe
> >
> > On Thu, Oct 11, 2018, at 6:05 PM, Wes McKinney wrote:
> > > ## Description:
> > >
> > > Apache Arrow is a cross-language development platform for in-memory
> data. It
> > > specifies a standardized language-independent columnar memory format
> for flat
> > > and hierarchical data, organized for efficient analytic operations on
> modern
> > > hardware. It also provides computational libraries and zero-copy
> streaming
> > > messaging and interprocess communication. Languages currently
> supported include
> > > C, C++, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust.
> > >
> > > ## Issues:
> > >
> > > - There are no issues requiring board attention at this time
> > >
> > > ## Activity:
> > >
> > > - The Arrow and Parquet communities resolved by vote to merge their
> respective
> > >   C++ codebases in the Apache Arrow repository. This work was
> completed this
> > >   quarter
> > > - The project received two code donations via IP clearance: a GLib
> interface to
> > >   the Parquet C++ libraries, and the Gandiva LLVM vectorized Arrow
> expression
> > >   compiler
> > > - Work has commenced on R language integration with the C++ libraries
> > > - An initial MATLAB binding to the C++ libraries was contributed
> > >
> > > ## Health report:
> > > - The project is very healthy, though rapid user and contributor
> growth has
> > >   stressed the limits of our developer tooling and put a great deal of
> burden
> > >   on the active project maintainers
> > >
> > > ## PMC changes:
> > >
> > >  - Currently 24 PMC members.
> > >  - Antoine Pitrou was added to the PMC on Mon Aug 20 2018
> > >
> > > ## Committer base changes:
> > >
> > >  - Currently 33 committers.
> > >  - New commmitters:
> > > - Andrew Grove was added as a committer on Tue Aug 07 2018
> > > - Krisztian Szucs was added as a committer on Thu Aug 16 2018
> > >
> > > ## Releases:
> > >
> > >  - 0.10.0 was released on Sun Aug 05 2018
> > >  - 0.11.0 was released on Sun Oct 07 2018
> > >
> > > ## JIRA activity:
> > >
> > >  - 649 JIRA tickets created in the last 3 months
> > >  - 476 JIRA tickets closed/resolved in the last 3 months
>


Re: Parquet to arrow java converter

2018-03-06 Thread Julien Le Dem
I would put in the parquet-mr codebase. I have contributed the schéma 
conversion code there. I’m happy to provide feedback on PRs in this area. 

Julien

> On Mar 6, 2018, at 12:18, Wes McKinney  wrote:
> 
> When it had been discussed in the past, the thinking had been to
> implement it in the Parquet Java codebase. I'd be interested in
> others' opinions about this (since I'm not an expert on Java matters)
> 
> - Wes
> 
>> On Tue, Mar 6, 2018 at 2:27 PM, Wenbo Zhao  wrote:
>> Hi,
>> 
>> Sorry that if someone may have asked the same question before. We are 
>> interested in providing a java convertor from Parquet to Arrow. Should I 
>> implement this converter in Parquet-mr/Parquet-arrow or under the Arrow 
>> project? I have the feeling that putting the implementation in 
>> Parquet-mr/Parquet-arrow would be preferable 
>> https://www.mail-archive.com/dev@arrow.apache.org/msg02606.html?
>> 
>> Thanks,
>> 
>> Wenbo


Re: [VOTE] Release Apache Arrow 0.7.0 - RC0

2017-09-17 Thread Julien Le Dem
+1 (binding)
On MacOs:
- verified signature
- ran Cpp build and tests
- ran java build and tests

On Thu, Sep 14, 2017 at 9:14 PM, Wes McKinney  wrote:

> +1 (binding)
>
> I created a release verification script for Linux:
> https://github.com/apache/arrow/pull/1102
> I also made a small fix to the Windows verification script:
> https://github.com/apache/arrow/pull/1101
>
> On Linux (Ubuntu 14.04) I:
>
> * Verified GPG signature, checksums
> * Ran C++ unit tests (including Python and GPU extensions)
> * Ran Python unit tests with Parquet and Plasma extensions
> * Ran C GLib unit tests
> * Ran Java unit tests
> * Ran integration tests
> * Ran JS unit tests
>
> On Windows / Visual Studio 2015, I ran the C++ unit tests and Python
> tests with Parquet extensions
>
> I noted a race condition in the build dependency graph with
> libarrow_gpu (ARROW-1541) and some GPU-related Valgrind warnings
> (ARROW-1540)
>
> - Wes
>
> On Wed, Sep 13, 2017 at 9:34 PM, Li Jin  wrote:
> > +1 (not binding)
> >
> > Unfortunately I currently don't have good internet access to download and
> > check the RC.
> > On Wed, Sep 13, 2017 at 12:48 PM Uwe L. Korn  wrote:
> >
> >> +1 (binding)
> >>
> >>  * Verified signature
> >>  * Build and tested C++/Python on OSX, Ubuntu 14.04 and Debian 7 & 8
> >>  * Build and tested Java on OSX
> >>
> >> --
> >>   Uwe L. Korn
> >>   uw...@xhochy.com
> >>
> >> On Wed, Sep 13, 2017, at 06:30 PM, Gang(Gary) Wang wrote:
> >> > +1 looks good to me.
> >> >
> >> > Gary
> >> >
> >> >
> >> > On Wed, Sep 13, 2017 at 8:10 AM, hei...@mojotech.com
> >> > 
> >> > wrote:
> >> >
> >> > >
> >> > >
> >> > > On 2017-09-12 16:23, Wes McKinney  wrote:
> >> > > > Hello all,
> >> > > >
> >> > > > I'd like to propose the 1st release candidate (rc0) of Apache
> >> > > > Arrow version 0.7.0.  This is a major release consisting of 131
> >> > > > resolved JIRAs [1].
> >> > > >
> >> > > > The source release rc0 is hosted at [2].
> >> > > >
> >> > > > This release candidate is based on commit
> >> > > > 97f9029ce835dfc2655ca91b9820a2e6aed89107 [3]
> >> > > >
> >> > > > The changelog is located at [4].
> >> > > >
> >> > > > Please download, verify checksums and signatures, run the unit
> tests,
> >> > > > and vote on the release.
> >> > > >
> >> > > > The vote will be open for at least 72 hours.
> >> > > >
> >> > > > [ ] +1 Release this as Apache Arrow 0.7.0
> >> > > > [ ] +0
> >> > > > [ ] -1 Do not release this as Apache Arrow 0.7.0 because...
> >> > > >
> >> > > > Thanks,
> >> > > > Wes
> >> > > >
> >> > > > How to validate a release signature:
> >> > > > https://httpd.apache.org/dev/verification.html
> >> > > >
> >> > > > [1]:
> >> > > > https://issues.apache.org/jira/issues/?jql=project%20%
> >> > > 3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%
> >> > > 20AND%20fixVersion%20%3D%200.7.0
> >> > > > [2]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.
> >> > > 7.0-rc0/
> >> > > > [3]:
> >> https://github.com/apache/arrow/tree/97f9029ce835dfc2655ca91b9820a2
> >> > > e6aed89107
> >> > > > [4]: https://git-wip-us.apache.org/repos/asf?p=arrow.git;a=blob_
> >> > > plain;f=CHANGELOG.md;hb=b671dccbffd69f4c2177ec469e3cd1369ede2af5
> >> > > >
> >> > > +1
> >> > >
> >>
>


Re: spark error with reading parquet file created vis pandas/pyarrow

2017-09-08 Thread Julien Le Dem
The int96 deprecation is slowly bubbling up the stack. There are still 
discussions in spark on how to make the change. So for now even though it's 
deprecated it is still used in some places. This should get resolved in the 
near future. 

Julien

> On Sep 8, 2017, at 14:12, Wes McKinney  wrote:
> 
> Turning on int96 timestamps is the solution right now. To save
> yourself some typing, you could declare
> 
> parquet_options = {
>'compression': ...,
>'use_deprecated_int96_timestamps': True
> }
> 
> pq.write_table(..., **parquet_options)
> 
>> On Fri, Sep 8, 2017 at 5:08 PM, Brian Wylie  wrote:
>> So, this is certainly good for future versions of Arrow. Do you have any
>> specific recommendations for a workaround currently?
>> 
>> Saving a parquet file with datetimes will obviously be a common use case
>> and if I'm understanding it correctly, right now saving a Parquet file with
>> PyArrow that file will not be readable by Spark at this point. Yes?  (I'm
>> asking this as opposed to stating this).
>> 
>> -Brian
>> 
>>> On Fri, Sep 8, 2017 at 2:58 PM, Wes McKinney  wrote:
>>> 
>>> Indeed, INT96 is deprecated in the Parquet format. There are other
>>> issues with Spark (it places restrictions on table field names, for
>>> example), so it may be worth adding an option like
>>> 
>>> pq.write_table(table, where, flavor='spark')
>>> 
>>> or maybe better
>>> 
>>> pq.write_table(table, where, flavor='spark-2.2')
>>> 
>>> and this would set the correct options for that version of Spark.
>>> 
>>> I created https://issues.apache.org/jira/browse/ARROW-1499 as a place
>>> to discuss further
>>> 
>>> - Wes
>>> 
>>> 
>>> On Fri, Sep 8, 2017 at 4:28 PM, Brian Wylie 
>>> wrote:
 Okay,
 
 So after some additional debugging, I can get around this if I set
 
 use_deprecated_int96_timestamps=True
 
 on the pq.write_table(arrow_table, filename, compression=compression,
 use_deprecated_int96_timestamps=True) call.
 
 But that just feels SO wrongas I'm sure it's deprecated for a reason
 (i.e. this will bite me later and badly)
 
 
 I also see this issue (or at least a related issue) reference in this
>>> Jeff
 Knupp blog...
 
 https://www.enigma.com/blog/moving-to-parquet-files-as-a-
>>> system-of-record
 
 So shrug... any suggestions are greatly appreciated :)
 
 -Brian
 
 On Fri, Sep 8, 2017 at 12:36 PM, Brian Wylie 
 wrote:
 
> Apologies if this isn't quite the right place to ask this question, but
>>> I
> figured Wes/others might know right off the bat :)
> 
> 
> Context:
> - Mac OSX Laptop
> - PySpark: 2.2.0
> - PyArrow: 0.6.0
> - Pandas: 0.19.2
> 
> Issue Explanation:
> - I'm converting my Pandas dataframe to a Parquet file with code very
> similar to
>   - http://wesmckinney.com/blog/python-parquet-update/
> - My Pandas DataFrame has a datetime index:  http_df.index.dtype =
> dtype(' - When loading the saved parquet file I get the error below
> - If I remove that index everything works fine
> 
> ERROR:
> - Py4JJavaError: An error occurred while calling o34.parquet.
> : org.apache.spark.SparkException: Job aborted due to stage failure:
> Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0
> in stage 0.0 (TID 0, localhost, executor driver):
> org.apache.spark.sql.AnalysisException: Illegal Parquet type: INT64
> (TIMESTAMP_MICROS);
> 
> Full Code to reproduce:
> - https://github.com/Kitware/bat/blob/master/notebooks/Bro_
> to_Parquet.ipynb
> 
> 
> Thanks in advance, also big fan of all this stuff... "be the chicken" :)
> 
> -Brian
> 
> 
> 
> 
>>> 


Re: [VOTE] Accept contribution of Plasma Object Store

2017-07-24 Thread Julien Le Dem
+1

On Sun, Jul 23, 2017 at 8:00 AM, Arun K. Subramaniyan  wrote:

> +1
>
> On Sun, Jul 23, 2017 at 1:16 AM Uwe L. Korn  wrote:
>
> > +1
> >
> > On Fri, Jul 21, 2017, at 01:37 AM, Julian Hyde wrote:
> > > +1
> > >
> > > > On Jul 20, 2017, at 3:07 PM, Bryan Cutler  wrote:
> > > >
> > > > +1 sounds great!
> > > >
> > > > On Thu, Jul 20, 2017 at 11:14 AM, Wes McKinney 
> > wrote:
> > > >
> > > >> Dear all,
> > > >>
> > > >> The Plasma Object Store provides a server process, reference C++
> > client,
> > > >> and
> > > >> Python binding for managing a collection of binary "objects" in
> POSIX
> > > >> shared
> > > >> memory. Applications use a lightweight messaging protocol to create
> > and
> > > >> delete
> > > >> memory blocks in the object store, evict objects to make room for
> new
> > > >> objects,
> > > >> and increment and decrement reference counts to indicate shared
> > ownership
> > > >> of
> > > >> memory. It also provides for subscribing to notifications about
> object
> > > >> activity. The system helps simplify ownership transfer and memory
> > lifetime
> > > >> of
> > > >> shared memory blocks, which can be much more complicated in a
> > peer-to-peer
> > > >> architecture.
> > > >>
> > > >> The object store has been used in conjunction with the Apache Arrow
> > > >> libraries
> > > >> to provide for zero-copy access to collections of large objects
> > stored in
> > > >> shared memory. Incorporating this project into Apache Arrow will
> help
> > the
> > > >> community continue to develop and innovate technology for
> low-overhead
> > > >> sharing
> > > >> of complex datasets across multiple processes.
> > > >>
> > > >> Plasma Object Store was developed by the Ray project with the UC
> > Berkeley
> > > >> RISELab. There have been 8 contributors, with about 3.1 KLOC of C++
> > code
> > > >> and an
> > > >> additional 5.1 KLOC of thirdparty C and C++ code which we have
> > reviewed for
> > > >> compatibility with the Apache Software Foundation's policies on
> > license
> > > >> compatibility.
> > > >>
> > > >> This code was split off from the Ray project from commit id
> > > >> `b94b4a35e04d8d2c0af4420518a4e9a94c1c9b9f` [1] and modified by the
> > > >> authors for
> > > >> inclusion in Apache Arrow in a GitHub pull request [2]. This code
> has
> > been
> > > >> staged in a separate repository for review by the community and ASF
> IP
> > > >> Clearance:
> > > >>
> > > >> - https://github.com/ray-project/arrow-plasma-object-store/tree/
> > > >> 11795753b0850cf5ad50d640067a8517ad8629a2#diff-
> > > >> 69e56fcedf1b794992b790684902dcd4
> > > >>
> > > >> This vote is to determine whether the Arrow PMC is in favor of
> > accepting
> > > >> the
> > > >> code contribution. If the vote passes, the PMC and the authors of
> the
> > code
> > > >> will
> > > >> work together to complete the ASF IP Clearance process and import
> the
> > > >> Plasma
> > > >> Object Store into Apache Arrow for inclusion in a future release:
> > > >>
> > > >>[ ] +1 : Accept contribution of Plasma Object Store
> > > >>[ ]  0 : No opinion
> > > >>[ ] -1 : Reject contribution because...
> > > >>
> > > >> The vote is open for 72 hours and will close at 18:15 UTC on Sunday
> > 23 July
> > > >> 2017 and the results will be announced on this list.
> > > >>
> > > >> Thanks,
> > > >> Wes
> > > >>
> > > >> [1]: https://github.com/ray-project/ray/commit/
> > > >> b94b4a35e04d8d2c0af4420518a4e9a94c1c9b9f
> > > >> [2]: https://github.com/apache/arrow/pull/742
> > > >>
> > >
> >
>


Re: [VOTE] Release Apache Arrow 0.5.0 - RC2

2017-07-23 Thread Julien Le Dem
+1 (binding)
on MacOS:
* Verified signature
* ran java build, unit tests, packages
* build and ran test for C++

1 note:
 - missing from the build notes: new jemalloc dependency (I had to brew
install jemalloc)

On Sun, Jul 23, 2017 at 6:20 AM, Uwe L. Korn  wrote:

> +1 (binding)
>
> * Verified signature and checksum
> * Ran Java unit tests, packaged to JARs
> * Build and run C++ & Python unit tests on Debian 7 / Debian 8 with
> gcc5.4
> * Build and run C++ & Python unit tests in manylinux1 container on
> Centos 5
> * Ran Python unit tests including --with-parquet
>
> Uwe
>
> On Fri, Jul 21, 2017, at 08:52 PM, Wes McKinney wrote:
> > +1 (binding)
> >
> > * Verified signature and checksum
> > * Ran Java unit tests, packaged to JARs
> > * Build and run C++ unit tests on Ubuntu 14.04 / gcc 4.8
> > * Built Parquet C++ against Arrow 0.5.0 and ran its unit tests
> > * Ran Python unit tests including --with-parquet
> > * Ran GLib C Ruby unit tests
> > * Ran integration tests (after manually re-enabling the Java tester
> > class per Bryan's note, I don't think this is a blocker)
> >
> > * Ran C++ unit tests for Arrow, Parquet tests against Arrow 0.5.0, and
> > Python tests with Visual Studio 2017
> >
> > - Wes
> >
> > On Thu, Jul 20, 2017 at 7:36 PM, Bryan Cutler  wrote:
> > > I ran tests for Java, C++, Python
> > > Integration tests had Java disabled, not sure if that would be a
> blocker or
> > > not.  After re-enabling, all tests pass.  Here is the PR to re-enable
> > > https://github.com/apache/arrow/pull/875
> > >
> > > +0 (non-binding) since I'm not sure if the above is a blocker
> > >
> > > On Thu, Jul 20, 2017 at 10:17 AM, Wes McKinney 
> wrote:
> > >
> > >> Hello all,
> > >>
> > >> I'd like to propose the 2nd release candidate (rc2) of Apache Arrow
> version
> > >> 0.5.0.  This is a major release consisting of 130 resolved JIRAs [1].
> > >>
> > >> The source release rc2 is hosted at [2]. It is the same as the rc1
> > >> release with the omission of the cpp/src/plasma directory, pending IP
> > >> clearance.
> > >>
> > >> This release candidate is based on commit
> > >> e9f76e125b836d0fdc0a533e2fee3fca8bf4c1a1 [3]
> > >>
> > >> The will be open for ~72 hours, ending 17:30 UTC Time on Sunday
> > >> July 23, 2017.
> > >>
> > >> [ ] +1 Release this as Apache Arrow 0.5.0
> > >> [ ] +0
> > >> [ ] -1 Do not release this as Apache Arrow 0.5.0 because...
> > >>
> > >> Thanks,
> > >> Wes
> > >>
> > >> How to validate a release signature:
> > >> https://httpd.apache.org/dev/verification.html
> > >>
> > >> [1]:
> > >> https://issues.apache.org/jira/issues/?jql=project%20%
> > >> 3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%
> > >> 20AND%20fixVersion%20%3D%200.5.0
> > >> [2]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.
> 5.0-rc2/
> > >> [3]: https://github.com/apache/arrow/commit/
> e9f76e125b836d0fdc0a533e2fee3f
> > >> ca8bf4c1a1
> > >>
>


Re: Arrow sync tomorrow?

2017-07-18 Thread Julien Le Dem
15 utc / 8am PT tomorrow works for me. 
(Assuming I did the conversion right)
Julien

> On Jul 18, 2017, at 10:21, Uwe L. Korn  wrote:
> 
> Hello Wes,
> 
> I would have time for an Arrow sync before the Parquet sync (open to any
> time there). Sadly, I don't have time after it. 
> 
> Uwe
> 
>> On Tue, Jul 18, 2017, at 03:16 PM, Wes McKinney wrote:
>> We haven't had a sync call in a few weeks -- it would be good to touch
>> base and see if there's anything to discuss with the 0.5.0 release or
>> otherwise.
>> 
>> The Parquet sync is at 16:30 UTC tomorrow -- we could have a brief
>> call at 15:00 UTC tomorrow if that is a good time. Or we can meet at
>> 17:30 UTC right after the Parquet sync. Let me know if anyone has any
>> strong preferences and I will send out a hangout link later to the
>> mailing list
>> 
>> http://timesched.pocoo.org/
>> 
>> Thanks
>> Wes


Arrow sync starting now

2017-06-21 Thread Julien Le Dem
https://hangouts.google.com/hangouts/_/calendar/anVsaWVuLmxlZGVtQGdtYWlsLmNvbQ.us971oqfjhdj1b1m3esm4a9q5s


Re: Moving Arrow to gitbox.apache.org?

2017-06-17 Thread Julien Le Dem
+1

Julien

> On Jun 17, 2017, at 15:08, Wes McKinney  wrote:
> 
> None here. I think it's a good idea.
> 
>> On Sat, Jun 17, 2017 at 3:37 PM Uwe L. Korn  wrote:
>> 
>> Picking this up again, as we would like to create a secondary repository
>> to build Python releases: https://issues.apache.org/jira/browse/ARROW-1116
>> I would really like to try out gitbox and this seems like a nice place to
>> get more experiences with it. Is there any opposition to that?
>> 
>>> Am 20.05.2017 um 15:52 schrieb Justin Erenkrantz >> :
>>> 
 On Fri, May 19, 2017 at 6:59 PM, Jacques Nadeau 
>> wrote:
 
 My main question would be: can we disable merges in the ui and only
 constraint to rebase/fast-forward merges? I've found on other GitHub
 projects that the default behavior of merge makes an unintelligible
 history.
>>> 
>>> 
>>> Once the repository is created, we should be able to set it as:
>>> 
>>> https://help.github.com/articles/configuring-pull-request-merges/
>>> 
>>> I'm guessing that we would want it to be "Allow Rebase Merging" only?
>> That
>>> seems reasonable to me.
>>> 
>>> This *might* have to be configured by Infra if we're not given the
>>> Owner/Maintainer privileges.  From what I can tell, I don't believe that
>>> regular Contributors (who can push changes) can modify this setting
>>> themselves.  (The infra team just confirmed that they can alter that
>> value.)
>>> 
>>> Cheers.  -- justin
>> 


[jira] [Created] (ARROW-1102) Make MessageSerializer.serializeMessage() public

2017-06-07 Thread Julien Le Dem (JIRA)
Julien Le Dem created ARROW-1102:


 Summary: Make MessageSerializer.serializeMessage() public
 Key: ARROW-1102
 URL: https://issues.apache.org/jira/browse/ARROW-1102
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java - Vectors
Reporter: Julien Le Dem
Assignee: Julien Le Dem


These methods are useful to serialize/deserialize messages.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] Release Apache Arrow 0.4.1 - rc0

2017-06-07 Thread Julien Le Dem
+1
I validated the signature and build + ran tests for java and c++


On Tue, Jun 6, 2017 at 7:27 PM, Wes McKinney  wrote:

> Hello all,
>
> I'd like to propose the 1st release candidate (rc0) of Apache
> Arrow version 0.4.1.  This is a bug fix release consisting of 30
> resolved JIRAs [1].
>
> The source release rc0 is hosted at [2].
>
> This release candidate is based on commit
> 46315431aeda3b6968b3ac4c1087f6d41052b99d
>
> The will be open for ~72 hours, ending 22:30 Eastern US Time on Friday
> June 9, 2017.
>
> [ ] +1 Release this as Apache Arrow 0.4.1
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow 0.4.1 because...
>
> Thanks,
> Wes
>
> How to validate a release signature:
> https://httpd.apache.org/dev/verification.html
>
> [1]:
> https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%
> 20AND%20fixVersion%20%3D%200.4.1
> [2]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.4.1-rc0/
> [3]: https://github.com/apache/arrow/tree/46315431aeda3b6968b3ac4c1087f6
> d41052b99d
>



-- 
Julien


[jira] [Created] (ARROW-1092) More Decimal and scale flipped follow-up

2017-06-05 Thread Julien Le Dem (JIRA)
Julien Le Dem created ARROW-1092:


 Summary: More Decimal and scale flipped follow-up
 Key: ARROW-1092
 URL: https://issues.apache.org/jira/browse/ARROW-1092
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java - Vectors
Affects Versions: 0.4.0
Reporter: Julien Le Dem
Assignee: Julien Le Dem






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ARROW-1091) Decimal scale and precision are flipped

2017-06-05 Thread Julien Le Dem (JIRA)
Julien Le Dem created ARROW-1091:


 Summary: Decimal scale and precision are flipped
 Key: ARROW-1091
 URL: https://issues.apache.org/jira/browse/ARROW-1091
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java - Vectors
Affects Versions: 0.4.0
Reporter: Julien Le Dem
Assignee: Julien Le Dem






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ARROW-1085) [java] Follow up on template cleanup. Missing method for IntervalYear

2017-06-02 Thread Julien Le Dem (JIRA)
Julien Le Dem created ARROW-1085:


 Summary: [java] Follow up on template cleanup. Missing method for 
IntervalYear
 Key: ARROW-1085
 URL: https://issues.apache.org/jira/browse/ARROW-1085
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java - Vectors
Affects Versions: 0.4.0
Reporter: Julien Le Dem
Assignee: Julien Le Dem






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Arrow sync in 15 min

2017-05-31 Thread Julien Le Dem
 Notes:

Attendees/agenda building
Wes (TwoSigma):
 - Rest API
 - Roadmap
 - communicate with community
Uwe (Blue Yonder):
 - git tag for versioning
Julien (Dremio):
 - Timestamp:
 - REST API
 - Roadmap

Discussion:
 - git tag for versioning
- development packages version names are based on latest tag in history
from master + commit count since then.
- since the release tag is in a branch it goes from an older version
and is misleading
- options:
   - add a tag {release version}.post on the first commit after the
release to get a better dev version string
   - rebase master on top of the last release (0.4)
- we decided to rebase master (the only change is adding the commit
that updates the version number in pom files)
 - Timestamp in Arrow and Parquet:
- Both support "Timezone Naive” timestamps (aka “timestamp without
timezone” in SQL)
- in Arrow when timezone field is missing in Timestamp type:
https://github.com/apache/arrow/blob/5899800f53f3c3fffc0db95294c4f0eb0e556228/format/Schema.fbs#L117
- in Parquet (proposed PR) when isAdjustedToUTC is false:
https://github.com/apache/parquet-format/pull/51/files#diff-0f9d1b5347959e15259da7ba8f4b6252R242
- They also both support a “Timezone aware” timestamp (aka “timestamp
with timezone” in SQL)
- in Arrow when the timezone field is present with the original
timezone.
- in Parquet when isAdjustedToUTC is true
- So there is more information in Arrow and it requires this extra
information since its absence means “timezone naive”
- conclusion:
- when writing to parquet we should use isAdjustedToUTC = false
only if there is no knowledge of the timezone
- when reading from parquet we will populate timezone with UTC
when isAdjustedToUTC == true (and leave it missing otherwise)
 - REST API:
   - review doc here:
https://docs.google.com/document/d/1N4TP6zARRs2c4_h-4WqCqIFVPQwmxOmXel1V3AxpGok/edit#
 - Roadmap:
- todo: blog post to describe the direction of arrow
- among those:
  - REST API and generalizing messaging
   - C++ analytics library for interacting with ARROW memory. Tools for
wrapping existing data structure (array of doubles)
   - arrow for GPU
   - Arrow ODBC interface: turbodbc
   - Spark integration improvements: group UDFS etc

On Wed, May 31, 2017 at 9:16 AM, Julien Le Dem <jul...@dremio.com> wrote:

> The arrow sync is at 9:30 am PT today on google hangout
> https://hangouts.google.com/hangouts/_/dremio.com/arrow
>
> --
> Julien
>



-- 
Julien


[jira] [Created] (ARROW-1077) Define Arrow "REST API"

2017-05-31 Thread Julien Le Dem (JIRA)
Julien Le Dem created ARROW-1077:


 Summary: Define Arrow "REST API"
 Key: ARROW-1077
 URL: https://issues.apache.org/jira/browse/ARROW-1077
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Format
Reporter: Julien Le Dem
Assignee: Julien Le Dem


Design doc for comments here:
https://docs.google.com/document/d/1N4TP6zARRs2c4_h-4WqCqIFVPQwmxOmXel1V3AxpGok/edit#heading=h.mvfrm68y999s



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Arrow sync in 15 min

2017-05-31 Thread Julien Le Dem
The arrow sync is at 9:30 am PT today on google hangout
https://hangouts.google.com/hangouts/_/dremio.com/arrow

-- 
Julien


Re: [ANNOUNCE] Apache Arrow 0.4.0 released

2017-05-25 Thread Julien Le Dem
I just published them.
They should propagate to mirrors shortly.

On Thu, May 25, 2017 at 6:16 AM, Emilio Lahr-Vivaz 
wrote:

> Congrats on the release! Is there a time frame for java artifacts being
> available on maven central?
>
> Thanks,
>
> Emilio
>
>
> On 05/23/2017 01:06 PM, Wes McKinney wrote:
>
>> The Apache Arrow community is pleased to announce the 0.4.0 release. It
>> includes 77 resolved issues ([1]) since the 0.3.0 release.
>>
>> The release is available now from our website and [2]:
>>  http://arrow.apache.org/install/
>>
>> Read about what's new in the release
>>  http://arrow.apache.org/blog/2017/05/23/0.4.0-release/
>>
>> Changelog
>>  http://arrow.apache.org/release/0.4.0.html
>>
>> What is Apache Arrow?
>> -
>>
>> Apache Arrow is a columnar in-memory analytics layer designed to
>> accelerate big
>> data. It houses a set of canonical in-memory representations of flat and
>> hierarchical data along with multiple language-bindings for structure
>> manipulation. It also provides low-overhead streaming and batch messaging,
>> zero-copy interprocess communication (IPC), and common algorithm
>> implementations.
>>
>> Please report any feedback to the mailing lists ([3])
>>
>> Regards,
>> The Apache Arrow community
>>
>> [1]: https://issues.apache.org/jira/issues/?jql=project%20%3D%
>> 20ARROW%20AND%20fixVersion%20%3D%200.4.0%20ORDER%20BY%20priority%20DESC
>> [2]: https://dist.apache.org/repos/dist/release/arrow/arrow-0.4.0/
>> [3]: https://lists.apache.org/list.html?dev@arrow.apache.org
>>
>
>


-- 
Julien


[jira] [Created] (ARROW-1069) Add instructions for publishing maven artifacts

2017-05-25 Thread Julien Le Dem (JIRA)
Julien Le Dem created ARROW-1069:


 Summary: Add instructions for publishing maven artifacts
 Key: ARROW-1069
 URL: https://issues.apache.org/jira/browse/ARROW-1069
 Project: Apache Arrow
  Issue Type: Task
  Components: Java - Vectors
Reporter: Julien Le Dem
Assignee: Julien Le Dem






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] Release Apache Arrow 0.4.0 - rc1

2017-05-22 Thread Julien Le Dem
+1 (binding)
* verified signature
* ran java, c++ build and unit tests on macos

On Mon, May 22, 2017 at 8:32 AM, Wes McKinney  wrote:

> Hi folks -- please remember to test the release candidate before the vote
> closes later today. Thank you!
>
> On Sat, May 20, 2017 at 10:22 AM Wes McKinney  wrote:
>
> > +1 (binding)
> >
> > * Verified signature, checksums. Ran Rat
> > * Ran Java, C++, GLib, Python unit tests on Ubuntu 14.04, including
> > HDFS C++ and Python IO tests (I ran into
> > https://issues.apache.org/jira/browse/ARROW-1056, but this is a
> > test-only bug), and Parquet unit tests
> > * Verified that memory leak ARROW-1053 is fixed
> >
> > On Sat, May 20, 2017 at 10:05 AM, Justin Erenkrantz
> >  wrote:
> > > On Fri, May 19, 2017 at 1:50 PM, Wes McKinney 
> > wrote:
> > >
> > >> [X] +1 Release this as Apache Arrow 0.4.0
> > >>
> > >
> > > I verified GPG, MD5, and SHA sums.
> > >
> > > And, I confirmed that Java and C++ still looks good on Ubuntu
> > 17.04/x86_64.
> > >  (Nothing really changed there since rc0.)
> > >
> > > Cheers.  -- justin
> >
>



-- 
Julien


Re: Moving Arrow to gitbox.apache.org?

2017-05-19 Thread Julien Le Dem
Thank you Justin for the investigation!

I'd be very interested in being able to manage PRs
(rename/close/...) through the github UI. Which I understand this would
make possible.
We are already pushing to only one source since github is a read only
mirror. I think it would not be a problem to update our process and
possibly decide to push to only to github.
Uwe expressed interest in using the github UI to merge PRs. We would have
to figure out some other things we do like closing jiras in the merge
script.
Is there a JIRA integration that closes the ticket when the branch is
merged?

I'm curious to hear others' thoughts on this.


On Wed, May 17, 2017 at 9:47 AM, Justin Erenkrantz 
wrote:

> Hi dev@,
>
> In response to some questions from Julien on Slack, I chatted with some
> Infra folks here at ApacheCon NA in Miami about the current state of
> affairs around Git support at ASF.  There is a relatively new initiative
> called GitBox that is beginning some pilot tests with interested projects.
> If you have not yet seen it, GitBox is up at:
>
> https://gitbox.apache.org/
>
> It allows synchronization between Apache IDs and GitHub accounts.  Once
> accounts are linked and an eligible project is in GitBox, then you will
> receive an invitation from the Apache GitHub organization to join that
> project on GitHub and be a "Collaborator" (e.g. you can push changes).  The
> canonical list of committers would be kept on the ASF LDAP infrastructure -
> so as new committers are added to the project ACLs, they can then receive
> invites from GitBox if they have linked accounts.
>
> At that point, those with eligible and linked accounts can then technically
> push to either gitbox.apache.org *or* github.com.  However, Daniel (aka
> humbedooh) strongly suggested that the community pick one place to act as a
> canonical location as Git can often get confused.  That said, the backing
> infrastructure behind GitBox will do its best if we go to both places.
>
> This would allow integration with the GitHub PR workflow with the changes
> being mirrored over to gitbox.apache.org.  This would replace the existing
> git.apache.org workflow - it may be a bit disruptive to anyone using the
> existing git.apache.org repository or GitHub read-only mirror.
>
> If we're interested in going down this route, we'd need to file a JIRA
> ticket with Infra and get approval from either Daniel or Greg.  They would
> then schedule a time to do the transition.
>
> Thoughts?
>
> Cheers.  -- justin
>



-- 
Julien


Re: [VOTE] Release Apache Arrow 0.4.0 - rc0

2017-05-17 Thread Julien Le Dem
I validated the signature
build and ran tests for
 - cpp
 - java
+1


On Wed, May 17, 2017 at 1:07 PM, Wes McKinney  wrote:

> Hello all,
>
> I'd like to propose the 1st release candidate (rc0) of Apache Arrow version
> 0.4.0.  It covers a total of 75 resolved JIRAs [1]. Thanks to everyone who
> contributed to this release!
>
> The source release rc0 is hosted at [2].
>
> This release candidate is based on commit
> fea6b71468618d22ece16250ff75f23ba2f18914
>
> The vote will be open for the next ~72 hours ending at 16:15 Eastern US
> Time,
> May 20, 2017.
>
> [ ] +1 Release this as Apache Arrow 0.4.0
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow 0.4.0 because...
>
> Thanks,
> Wes
>
> How to validate a release signature:
> https://httpd.apache.org/dev/verification.html
>
> [1]:
> https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%
> 20AND%20fixVersion%20%3D%200.4.0
> [2]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.4.0-rc0/
> [3]: https://github.com/apache/arrow/tree/fea6b71468618d22ece16250ff75f2
> 3ba2f18914
>



-- 
Julien


[jira] [Created] (ARROW-1049) [java] vector template cleanup

2017-05-17 Thread Julien Le Dem (JIRA)
Julien Le Dem created ARROW-1049:


 Summary: [java] vector template cleanup
 Key: ARROW-1049
 URL: https://issues.apache.org/jira/browse/ARROW-1049
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java - Vectors
Reporter: Julien Le Dem
Assignee: Julien Le Dem


Reduce the checks for a specific type in templates.
Remove old code related to removed types.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Making Arrow 0.3.1 minor release

2017-05-15 Thread Julien Le Dem
SGTM

On Sun, May 14, 2017 at 12:36 PM, Wes McKinney  wrote:

> OK, I've thought a bit more about it and I'm +1 on doing a 0.4.0. There's
> enough meaningful new features and improvements since 0.3.0 already (almost
> 50 patches) to merit a small announcement blog. I will plan to cut the RC
> on Tuesday to give enough time to get in some last minute cleanup patches.
>
> - Wes
>
> On Sun, May 14, 2017 at 12:09 PM, Uwe L. Korn  wrote:
>
> > Hello,
> >
> > I would also be in favour of a bugfix release. Although, as we had quite
> > some API changes in https://github.com/apache/arrow/pull/680, I would
> > vote for a 0.4.0 release.
> >
> > Uwe
> >
> > --
> >   Uwe L. Korn
> >   uw...@xhochy.com
> >
> > On Sun, May 14, 2017, at 06:02 PM, Wes McKinney wrote:
> > > hi folks,
> > >
> > > I just fixed a quite serious memory leak in the Arrow Python bindings:
> > >
> > > https://github.com/apache/arrow/pull/685
> > >
> > > There have been some other bugs fixed since 0.3.0, and since I don't
> > > think we have any API changes, I think we could release master as is.
> > > We could also do a major version bump to 0.4.0.
> > >
> > > Let me know what others think.
> > >
> > > Thanks
> > > Wes
> >
>



-- 
Julien


Re: Branching for Arrow releases

2017-05-05 Thread Julien Le Dem
Alternatively we can rebase master on the release if patch have been merged
concurently to the release vote.
I think it is fine to rebase commits that have not been released yet.
(the release sha however must stay the same)
I find usefull to have the release tag in the master history to know byt
looking at the git log if a given patch was before or after a release. Even
if this info is duplicated somewhere else (jira) this one is the source of
truth.

On Fri, May 5, 2017 at 7:18 AM, Wes McKinney  wrote:

> For the first few releases, we've been holding off merging patches to
> master while the release vote is in progress, partially because of the
> commits that the maven-release-plugin commits.
>
> I would propose that in the future we continue to merge patches and perform
> the release tag in a branch (so the release tag itself won't appear in the
> master timeline) so that development flow is not interrupted. I'm not
> familiar with what other projects having Java libraries do, so let me know
> if there's a preferred workflow.
>
> Thanks
> Wes
>



-- 
Julien


Re: [VOTE] Release Apache Arrow 0.3.0 - rc1

2017-05-03 Thread Julien Le Dem
+1 (binding)

I ran the following:
- verified signature
- Build and tested C++ on macOS
- Build and tested Java on macOS

On Wed, May 3, 2017 at 10:54 AM, Gary Wong  wrote:

> Oh, Got it, Thank you for the clarification. :)
>
> On Wed, May 3, 2017 at 9:07 AM, Uwe L. Korn  wrote:
>
> > It's rc1 because we started with rc0. 0-indexing ;)
> >
> >
> > On Wed, May 3, 2017, at 06:00 PM, Gang(Gary) Wang wrote:
> > > +1 (non-binding) LGTM, wonder why still rc1 ? Thanks.
> > >
> > > Gary
> > >
> > > On Tue, May 2, 2017 at 1:46 PM, Wes McKinney 
> > wrote:
> > >
> > > > Hello all,
> > > >
> > > > I'd like to propose the 2nd release candidate (rc1) of Apache Arrow
> > version
> > > > 0.3.0.  It covers a total of 263 resolved JIRAs [1] Thanks to
> everyone
> > who
> > > > contributed to this release.
> > > >
> > > > The source release rc1 is hosted at [2].
> > > > This release candidate is based on commit
> > > > d8db8f8a11a6c45645b2d7370610311731bd located at [3].
> > > >
> > > > The vote will be open for the next ~72 hours ending at 16:45 Eastern
> US
> > > > Time,
> > > > May 5, 2017.
> > > >
> > > > [ ] +1
> > > > [ ] +0
> > > > [ ] -1
> > > >
> > > > I have
> > > >
> > > > * Built Java libraries and run unit tests
> > > > * Built and installed C++ and Python libraries on Ubuntu 14.04 (gcc
> > 4.8),
> > > > and
> > > >   run unit tests
> > > > * Built and installed C++ and Python libraries on Visual Studio 2015,
> > and
> > > > run
> > > >   unit tests
> > > > * Run the Java<->C++ integration tests for the stream and file
> formats
> > > > * Build and installed C GLib bindings. The tarball does not include
> the
> > > > test
> > > >   suite; it would be good to provide a way to validate the build
> from a
> > > > source
> > > >   release
> > > >
> > > > Here's my vote: +1 (binding)
> > > >
> > > > Thanks,
> > > > Wes
> > > >
> > > > How to validate a release signature:
> > > > https://httpd.apache.org/dev/verification.html
> > > >
> > > > [1]:
> > > > https://issues.apache.org/jira/issues/?jql=project%20%
> > > > 3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%
> > > > 20AND%20fixVersion%20%3D%200.3.0
> > > > [2] *https://dist.apache.org/repos/dist/dev/arrow/apache-
> > arrow-0.3.0-rc1/
> > > >  arrow-0.3.0-rc1/
> > >*
> > > > [3]
> > > > *
> > > > https://github.com/apache/arrow/tree/d8db8f8a11a6c45645b2d73706
> > > > 10311731bd
> > > > <
> > > > https://github.com/apache/arrow/tree/d8db8f8a11a6c45645b2d73706
> > > > 10311731bd
> > > > >*
> > > >
> >
>



-- 
Julien


[jira] [Created] (ARROW-936) Fix release README

2017-05-02 Thread Julien Le Dem (JIRA)
Julien Le Dem created ARROW-936:
---

 Summary: Fix release README
 Key: ARROW-936
 URL: https://issues.apache.org/jira/browse/ARROW-936
 Project: Apache Arrow
  Issue Type: Bug
  Components: Documentation
Reporter: Julien Le Dem
Assignee: Julien Le Dem






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] Release Apache Arrow 0.3.0 - rc0

2017-05-01 Thread Julien Le Dem
So I sent this a little fast and used Uwe's email for 0.2.0 as a template.
What I actually ran:
- java build and unit tests
- c++ build and unit tests
still my +1

On Mon, May 1, 2017 at 3:35 PM, Julien Le Dem <jul...@dremio.com> wrote:

> Of course I meant:
> +1 (binding)
>
> On Mon, May 1, 2017 at 3:27 PM, Julien Le Dem <jul...@dremio.com> wrote:
>
>> Hello all,
>>
>> I'd like to propose the a release candidate (rc0) of Apache Arrow
>> version 0.3.0.
>> It covers a total of 265 resolved JIRAs [1] Thanks to everyone who
>> contributed to this release.
>>
>> The source release rc0 is hosted at [2].
>> This release candidate is based on commit
>> 0341a336e2174c7c89628696864a6427e12c10c6 located at [3].
>>
>> The vote will be open for the next ~72 hours ending at 15:30 PT,
>> May 4, 2017.
>>
>> [ ] +1
>> [ ] +0
>> [ ] -1
>>
>> I have run the java build + tests
>> I have run cpp build + tests
>> I have built the Python manylinux1 package + run the tests
>>
>> Here's my vote: +1 (non-binding)
>>
>> Thanks,
>> Julien
>>
>> How to validate a release signature:
>> https://httpd.apache.org/dev/verification.html
>>
>> [1]
>> *https://issues.apache.org/jira/browse/ARROW-753?jql=project%20%3D%20ARROW%20AND%20fixVersion%20%3D%200.3.0%20ORDER%20BY%20priority%20DESC
>> <https://issues.apache.org/jira/browse/ARROW-753?jql=project%20%3D%20ARROW%20AND%20fixVersion%20%3D%200.3.0%20ORDER%20BY%20priority%20DESC>*
>> [2] *https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.3.0-rc0/
>> <https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.3.0-rc0/>*
>> [3]
>> *https://github.com/apache/arrow/tree/0341a336e2174c7c89628696864a6427e12c10c6
>> <https://github.com/apache/arrow/tree/0341a336e2174c7c89628696864a6427e12c10c6>*
>>
>> --
>> Julien
>>
>
>
>
> --
> Julien
>



-- 
Julien


Re: [VOTE] Release Apache Arrow 0.3.0 - rc0

2017-05-01 Thread Julien Le Dem
Of course I meant:
+1 (binding)

On Mon, May 1, 2017 at 3:27 PM, Julien Le Dem <jul...@dremio.com> wrote:

> Hello all,
>
> I'd like to propose the a release candidate (rc0) of Apache Arrow
> version 0.3.0.
> It covers a total of 265 resolved JIRAs [1] Thanks to everyone who
> contributed to this release.
>
> The source release rc0 is hosted at [2].
> This release candidate is based on commit
> 0341a336e2174c7c89628696864a6427e12c10c6 located at [3].
>
> The vote will be open for the next ~72 hours ending at 15:30 PT,
> May 4, 2017.
>
> [ ] +1
> [ ] +0
> [ ] -1
>
> I have run the java build + tests
> I have run cpp build + tests
> I have built the Python manylinux1 package + run the tests
>
> Here's my vote: +1 (non-binding)
>
> Thanks,
> Julien
>
> How to validate a release signature:
> https://httpd.apache.org/dev/verification.html
>
> [1]
> *https://issues.apache.org/jira/browse/ARROW-753?jql=project%20%3D%20ARROW%20AND%20fixVersion%20%3D%200.3.0%20ORDER%20BY%20priority%20DESC
> <https://issues.apache.org/jira/browse/ARROW-753?jql=project%20%3D%20ARROW%20AND%20fixVersion%20%3D%200.3.0%20ORDER%20BY%20priority%20DESC>*
> [2] *https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.3.0-rc0/
> <https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.3.0-rc0/>*
> [3]
> *https://github.com/apache/arrow/tree/0341a336e2174c7c89628696864a6427e12c10c6
> <https://github.com/apache/arrow/tree/0341a336e2174c7c89628696864a6427e12c10c6>*
>
> --
> Julien
>



-- 
Julien


[jira] [Created] (ARROW-930) javadoc generation fails with java 8

2017-05-01 Thread Julien Le Dem (JIRA)
Julien Le Dem created ARROW-930:
---

 Summary: javadoc generation fails with java 8 
 Key: ARROW-930
 URL: https://issues.apache.org/jira/browse/ARROW-930
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java - Memory
Reporter: Julien Le Dem
Assignee: Julien Le Dem


Workaround: use java 7



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Serialize/deserialize ArrowRecordBatch to/from bytes?

2017-04-26 Thread Julien Le Dem
Example of writing to and reading from a file:
https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/file/TestArrowFile.java
Similarly, in case you don't want to go through a file:
Unloading a vector into buffers and loading from buffers:
https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/TestVectorUnloadLoad.java
The VectorLoader/Unloader are used to read/write FIles

On Wed, Apr 26, 2017 at 10:31 AM, Li Jin  wrote:

> Thanks for the various pointers. I was looking at ArrowFileWriter/Reader
> and got a little bit confused.
>
> So what I am trying to do is to convert a list of spark rows into some
> arrow format in java ( I will probably go with the file format for now),
> send the bytes to python, deserialize it into a pyarrow table.
>
> What is what I currently plan to do:
> (1) convert the rows to one or more arrow batch record (Use the
> ValueVectors)
> (2) serialize the arrow batch records send it over to python (Not sure to
> use here, ArrowFileWriter?)
> (3) deserialize the bytes into pyarrow.Table using pyarrow.FileReader
>
> I *think* ArrowFileWriter is what I should use to send data over in (2),
> but:
> (1)  I would need to turn the arrow batch records into a VectorSchemaRoot
> by doing sth like
> this
> https://github.com/icexelloss/spark/blob/pandas-udf/sql/
> core/src/test/scala/org/apache/spark/sql/ArrowConvertersSuite.scala#L226
> (2) I am not sure how do I write all the data in a vector schema root using
> ArrowFileWriter.
>
> Does this sound the right thing to do?
>
> Thanks,
> Li
>
> On Tue, Apr 25, 2017 at 8:52 PM, Wes McKinney  wrote:
>
> > Also, now that we have a website that is easier to write content for (in
> > Markdown), it would be great if some Java developers could volunteer some
> > time to write user-facing documentation to go with the Javadocs.
> >
> > On Tue, Apr 25, 2017 at 8:51 PM, Wes McKinney 
> wrote:
> >
> > > There is also https://github.com/apache/arrow/blob/master/java/
> > > veator/src/test/java/org/apache/arrow/vector/file/
> > TestArrowStreamPipe.java
> > >
> > > On Tue, Apr 25, 2017 at 8:46 PM, Li Jin  wrote:
> > >
> > >> Thanks Julien. I will follow
> > >> https://github.com/apache/arrow/blob/990e2bde758ac8bc6e4497a
> > >> e1bc37f89b71bb5cf/java/vector/src/test/java/org/apache/
> > >> arrow/vector/stream/MessageSerializerTest.java#L91
> > >>
> > >
> > >
> >
>



-- 
Julien


arrow sync happening now

2017-04-19 Thread Julien Le Dem
on google hangout:
https://hangouts.google.com/hangouts/_/dremio.com/arrow

-- 
Julien


[jira] [Created] (ARROW-824) Date and Time Vectors should reflect timezone-less semantics

2017-04-14 Thread Julien Le Dem (JIRA)
Julien Le Dem created ARROW-824:
---

 Summary: Date and Time Vectors should reflect timezone-less 
semantics
 Key: ARROW-824
 URL: https://issues.apache.org/jira/browse/ARROW-824
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java - Vectors
Reporter: Julien Le Dem


Currently getObject returns a Date|Time|DateTime and should return the Local* 
equivalent:
https://github.com/apache/arrow/blob/b6033378c2533ed7b396f111cc5aed10450907fb/java/vector/src/main/codegen/templates/FixedValueVectors.java#L520




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Arrow 0.3 release timeline

2017-04-14 Thread Julien Le Dem
I reviewed the currently pending PRs on the java side.
I opened 2 PRs for the opened java JIRAs from the list: ARROW-777, ARROW-720

On Fri, Apr 14, 2017 at 12:55 PM, Julien Le Dem <jul...@dremio.com> wrote:

> I'm looking through them
>
> On Fri, Apr 14, 2017 at 9:26 AM, Wes McKinney <wesmck...@gmail.com> wrote:
>
>> hi all,
>>
>> I'm working to close out the remaining Python and C++ stuff we wanted
>> to get in to 0.3 for the sake of other Python projects that want to
>> use Arrow.
>>
>> There are 8 patches up that touch the Java codebase. If we can get all
>> these closed out then I think we should be able to cut a release
>> candidate sometime next week.
>>
>> Thanks
>> Wes
>>
>> On Mon, Apr 10, 2017 at 12:38 PM, Wes McKinney <wesmck...@gmail.com>
>> wrote:
>> > With the Easter holiday and long weekend, we probably will not be able
>> > to do an RC until next week at earliest.
>> >
>> > In the meantime, there are a lot of patches in review. Please keep the
>> > release JIRA (https://issues.apache.org/jira/browse/ARROW-670) updated
>> > with blocking issues so we can track what patches are in progress or
>> > still TODO.
>> >
>> > Thanks!
>> > Wes
>> >
>> > On Wed, Apr 5, 2017 at 10:38 AM, Julien Le Dem <jul...@dremio.com>
>> wrote:
>> >> The current 0.3 goal is good for me.
>> >>
>> >> On Mon, Apr 3, 2017 at 11:25 AM, Wes McKinney <wesmck...@gmail.com>
>> wrote:
>> >>
>> >>> hi folks,
>> >>>
>> >>> We've been making excellent progress toward the 0.3 release -- 123
>> >>> patches in since 0.2, which was released a little over 6 weeks ago.
>> >>>
>> >>> Here is the JIRA to help track related issues:
>> >>>
>> >>> https://issues.apache.org/jira/browse/ARROW-670
>> >>>
>> >>> Once ARROW-510 (https://github.com/apache/arrow/pull/475, thanks
>> >>> Leif!) has been resolved, we'll have all of the date and time types
>> >>> reconciled which was one of the major goals for the release. It would
>> >>> be nice to get the fixed size binary working end to end (ARROW-634),
>> >>> but if that slips to 0.4, I don't think it's a big deal.
>> >>>
>> >>> I know there are a number of other Java patches in flight; I will let
>> >>> others comment on their status.
>> >>>
>> >>> What remains on the C++ and Python side
>> >>>
>> >>> * Decimal patch (ARROW-655)
>> >>> * Some date and time compatibility with pandas
>> >>> * Packaging issues on Linux / OS X, e.g. wheels for pyarrow that can
>> >>> be compiled against / linked to by thirdparties
>> >>> * Windows support for Python
>> >>> * Some miscellanea parquet-cpp integration improvements
>> >>>
>> >>> If all goes well, we may be able to cut an RC next week or the week
>> after.
>> >>>
>> >>> Is there anything else that others would like to see go into the 0.3
>> >>> release? I think we may be able to make the 0.4 release more quickly
>> >>> (~1 month or less after 0.3) as we finish out some of the additional
>> >>> compatibility issues (dictionaries in IPC stream/file format is a
>> >>> significant one).
>> >>>
>> >>> Thanks
>> >>> Wes
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Julien
>>
>
>
>
> --
> Julien
>



-- 
Julien


[jira] [Created] (ARROW-775) [Java] add simple constructors to value vectors

2017-04-06 Thread Julien Le Dem (JIRA)
Julien Le Dem created ARROW-775:
---

 Summary: [Java] add simple constructors to value vectors
 Key: ARROW-775
 URL: https://issues.apache.org/jira/browse/ARROW-775
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java - Vectors
Reporter: Julien Le Dem
Assignee: Julien Le Dem






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Arrow 0.3 release timeline

2017-04-05 Thread Julien Le Dem
The current 0.3 goal is good for me.

On Mon, Apr 3, 2017 at 11:25 AM, Wes McKinney  wrote:

> hi folks,
>
> We've been making excellent progress toward the 0.3 release -- 123
> patches in since 0.2, which was released a little over 6 weeks ago.
>
> Here is the JIRA to help track related issues:
>
> https://issues.apache.org/jira/browse/ARROW-670
>
> Once ARROW-510 (https://github.com/apache/arrow/pull/475, thanks
> Leif!) has been resolved, we'll have all of the date and time types
> reconciled which was one of the major goals for the release. It would
> be nice to get the fixed size binary working end to end (ARROW-634),
> but if that slips to 0.4, I don't think it's a big deal.
>
> I know there are a number of other Java patches in flight; I will let
> others comment on their status.
>
> What remains on the C++ and Python side
>
> * Decimal patch (ARROW-655)
> * Some date and time compatibility with pandas
> * Packaging issues on Linux / OS X, e.g. wheels for pyarrow that can
> be compiled against / linked to by thirdparties
> * Windows support for Python
> * Some miscellanea parquet-cpp integration improvements
>
> If all goes well, we may be able to cut an RC next week or the week after.
>
> Is there anything else that others would like to see go into the 0.3
> release? I think we may be able to make the 0.4 release more quickly
> (~1 month or less after 0.3) as we finish out some of the additional
> compatibility issues (dictionaries in IPC stream/file format is a
> significant one).
>
> Thanks
> Wes
>



-- 
Julien


[jira] [Created] (ARROW-720) [java] arrow should not have a dependency on slf4j bridges in compile

2017-03-27 Thread Julien Le Dem (JIRA)
Julien Le Dem created ARROW-720:
---

 Summary: [java] arrow should not have a dependency on slf4j 
bridges in compile
 Key: ARROW-720
 URL: https://issues.apache.org/jira/browse/ARROW-720
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java - Vectors
Reporter: Julien Le Dem
Assignee: Julien Le Dem


See: 
https://github.com/apache/arrow/blame/d2d27555b4b2f3f0ba26539211bfe8b4d1b52481/java/pom.xml#L472
as a library, arrow should not pick the direction of the bridges.

We should move those to test scope



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: ACTION NEEDED: JIRA notifications

2017-03-23 Thread Julien Le Dem
Thank you!

On Thu, Mar 23, 2017 at 6:52 AM, Wes McKinney  wrote:

> To reduce e-mail traffic to dev@arrow.apache.org, we've changed the
> JIRA notification schema to only send issues CREATION e-mails to this
> mailing list.
>
> All issue creation and comments/updates to JIRAs will be sent to
> iss...@arrow.apache.org. So I recommend subscribing
> (issues-subscr...@arrow.apache.org) to this mailing list to keep
> receiving all the updates (and possibly setting up a separate mail
> filter).
>
> You can also "watch" specific issues on JIRA to subscribe to all
> notifications.
>
> Please let us know if there's any questions.
>
> Thanks!
> Wes
>



-- 
Julien


[jira] [Created] (ARROW-704) Fix bad import caused by conflicting changes

2017-03-23 Thread Julien Le Dem (JIRA)
Julien Le Dem created ARROW-704:
---

 Summary: Fix bad import caused by conflicting changes
 Key: ARROW-704
 URL: https://issues.apache.org/jira/browse/ARROW-704
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java - Vectors
Reporter: Julien Le Dem
Assignee: Julien Le Dem






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ARROW-702) Fix BitVector.copyFromSafe to reAllocate instead of returning false

2017-03-22 Thread Julien Le Dem (JIRA)
Julien Le Dem created ARROW-702:
---

 Summary: Fix BitVector.copyFromSafe to reAllocate instead of 
returning false
 Key: ARROW-702
 URL: https://issues.apache.org/jira/browse/ARROW-702
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java - Vectors
Reporter: Julien Le Dem
Assignee: Julien Le Dem






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (ARROW-347) Add method to pass CallBack when creating a transfer pair

2017-03-22 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem reassigned ARROW-347:
---

Assignee: Julien Le Dem  (was: Steven Phillips)

> Add method to pass CallBack when creating a transfer pair
> -
>
> Key: ARROW-347
> URL: https://issues.apache.org/jira/browse/ARROW-347
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java - Vectors
>Reporter: Steven Phillips
>        Assignee: Julien Le Dem
>
> When calling the getTransferPair method of a NullableMapVector, we pass the 
> current vectors callback to the newly created vector. This is wrong, as the 
> new vector needs to have its own callback. Whoever is using the target vector 
> should have a handle on the callBack to deal with schema changes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Moving JIRA updates from dev@ to issues@ only?

2017-03-22 Thread Julien Le Dem
It looks like this is configurable here:
https://issues.apache.org/jira/plugins/servlet/project-config/ARROW/notifications
Do we need infra to do it?

On Wed, Mar 22, 2017 at 9:16 AM, Julian Hyde  wrote:

> And will the initial email, indicating the creation of a JIRA case,
> continue to be sent to dev@?
>
> If so, +1.
>
> On Wed, Mar 22, 2017 at 8:55 AM, Wes McKinney  wrote:
> > I created:
> >
> > https://issues.apache.org/jira/browse/ARROW-690
> >
> > Since issue traffic is picking up, it may be better to keep the e-mail
> > traffic on dev@ lighter to facilitate community discussions. Let me
> > know what others think -- we can discuss on the sync call today also
>



-- 
Julien


[jira] [Resolved] (ARROW-677) [java] Fix checkstyle jcl-over-slf4j conflict issue

2017-03-21 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved ARROW-677.
-
Resolution: Fixed

Issue resolved by pull request 412
[https://github.com/apache/arrow/pull/412]

> [java] Fix checkstyle jcl-over-slf4j conflict issue
> ---
>
> Key: ARROW-677
> URL: https://issues.apache.org/jira/browse/ARROW-677
> Project: Apache Arrow
>  Issue Type: Bug
>        Reporter: Julien Le Dem
>        Assignee: Julien Le Dem
>
> The build failed in master for me because of this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ARROW-677) [java] Fix checkstyle jcl-over-slf4j conflict issue

2017-03-21 Thread Julien Le Dem (JIRA)
Julien Le Dem created ARROW-677:
---

 Summary: [java] Fix checkstyle jcl-over-slf4j conflict issue
 Key: ARROW-677
 URL: https://issues.apache.org/jira/browse/ARROW-677
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Julien Le Dem
Assignee: Julien Le Dem


The build failed in master for me because of this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (ARROW-208) Add checkstyle policy to java project

2017-03-21 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem reassigned ARROW-208:
---

Assignee: Tsuyoshi Ozawa

> Add checkstyle policy to java project
> -
>
> Key: ARROW-208
> URL: https://issues.apache.org/jira/browse/ARROW-208
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java - Memory, Java - Vectors
>Reporter: Laurent Goujon
>Assignee: Tsuyoshi Ozawa
>Priority: Minor
>
> As suggested by [~jnadeau] in https://github.com/apache/arrow/pull/65, a set 
> of checkstyle policies should be added to Arrow java project so we can get 
> best practices and style checks enforced at build time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (ARROW-208) Add checkstyle policy to java project

2017-03-21 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved ARROW-208.
-
Resolution: Fixed

Issue resolved by pull request 96
[https://github.com/apache/arrow/pull/96]

> Add checkstyle policy to java project
> -
>
> Key: ARROW-208
> URL: https://issues.apache.org/jira/browse/ARROW-208
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java - Memory, Java - Vectors
>Reporter: Laurent Goujon
>Priority: Minor
>
> As suggested by [~jnadeau] in https://github.com/apache/arrow/pull/65, a set 
> of checkstyle policies should be added to Arrow java project so we can get 
> best practices and style checks enforced at build time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (ARROW-673) [Java] Support additional Time metadata

2017-03-21 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved ARROW-673.
-
Resolution: Fixed

Issue resolved by pull request 407
[https://github.com/apache/arrow/pull/407]

> [Java] Support additional Time metadata
> ---
>
> Key: ARROW-673
> URL: https://issues.apache.org/jira/browse/ARROW-673
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java - Vectors
>        Reporter: Julien Le Dem
>        Assignee: Julien Le Dem
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ARROW-676) [java] move from MinorType to FieldType in ValueVectors to carry all the relevant type bits

2017-03-21 Thread Julien Le Dem (JIRA)
Julien Le Dem created ARROW-676:
---

 Summary: [java] move from MinorType to FieldType in ValueVectors 
to carry all the relevant type bits
 Key: ARROW-676
 URL: https://issues.apache.org/jira/browse/ARROW-676
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java - Vectors
Reporter: Julien Le Dem
Assignee: Julien Le Dem






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ARROW-674) [Java] Support additional Timestamp timezone metadata

2017-03-20 Thread Julien Le Dem (JIRA)
Julien Le Dem created ARROW-674:
---

 Summary: [Java] Support additional Timestamp timezone metadata
 Key: ARROW-674
 URL: https://issues.apache.org/jira/browse/ARROW-674
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java - Vectors
Reporter: Julien Le Dem






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (ARROW-316) Finalize Date type

2017-03-20 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved ARROW-316.
-
Resolution: Fixed

Issue resolved by pull request 390
[https://github.com/apache/arrow/pull/390]

> Finalize Date type
> --
>
> Key: ARROW-316
> URL: https://issues.apache.org/jira/browse/ARROW-316
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>        Reporter: Julien Le Dem
>Assignee: Wes McKinney
> Fix For: 0.3.0
>
>
> Parquet defines it as: "number of days from the Unix epoch, 1 January 1970."
> https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#datetime-types
> We should make it the same.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Making some decisions about date and time types

2017-03-20 Thread Julien Le Dem
+1

On Mon, Mar 20, 2017 at 7:53 AM, Wes McKinney <wesmck...@gmail.com> wrote:

> Thanks all for the reviews so far.
>
> ARROW-316 is the last item under consideration for Date support.
>
> I created ARROW-670 for the 0.3 release. I propose we push for a 0.3
> release as soon as we incorporate these changes and complete
> integration tests for the date and time types
>
> On Fri, Mar 17, 2017 at 7:26 PM, Julien Le Dem <jul...@dremio.com> wrote:
> >> At some point someone will want MONTH as a time unit (to support SQL’s
> > year-to-month interval type)
> > There is an interval type here for that:
> > https://github.com/apache/arrow/blob/39c7274fc36b5f405f1dbfa48067dd
> e52abec5ce/format/Message.fbs#L98
> >
> > On Fri, Mar 17, 2017 at 1:23 PM, Julian Hyde <jh...@apache.org> wrote:
> >
> >> Am I correct that timestamp is a 64 bit signed integer representing
> >> microseconds since 1970? If so, it would be helpful to state the minimum
> >> and maximum values in the spec.
> >>
> >> I can’t quite imagine a use case for microsecond time, given that it
> takes
> >> the same number of bits as a timestamp. But still, no harm in including
> it.
> >>
> >> At some point someone will want MONTH as a time unit (to support SQL’s
> >> year-to-month interval type) and someone will want nanosecond timestamp
> >> (problematic, because it needs more than 64 bits for a useful range to
> >> dates). But these can wait until version 2.
> >>
> >> Julian
> >>
> >>
> >> > On Mar 17, 2017, at 9:51 AM, Wes McKinney <wesmck...@gmail.com>
> wrote:
> >> >
> >> > hi folks,
> >> >
> >> > We have some format decisions to make about all 3 of the primary
> >> > temporal types in Arrow:
> >> >
> >> > ARROW-617 - Time type
> >> > - It is proposed to add the type bit width to the metadata for
> >> > clarity, and using the smallest type that can accommodate a particular
> >> > time unit
> >> > - PATCH: https://github.com/apache/arrow/pull/385
> >> >
> >> > ARROW-316: Date type
> >> > - It is proposed to add a DateUnit to indicate day-based date (a la
> >> > PostgreSQL and other systems) as int32 vs. millisecond-based date as
> >> > int64 (a la Joda, and current Arrow Java)
> >> > - PATCH: https://github.com/apache/arrow/pull/390
> >> >
> >> > ARROW-637: Timestamp type
> >> > - It is proposed to add a timezone string to the metadata as to
> >> > disambiguate TZ-naive vs. TZ-aware data, but otherwise display only
> >> > (changing the time zone does not alter the physical int64 timestamp
> >> > values)
> >> > - PATCH: https://github.com/apache/arrow/pull/388
> >> >
> >> > There seems to be some degree of consensus on all 3 of these, but it
> >> > would be good to reach a final decision and merge patches so that we
> >> > can do the corresponding dev work in Java and C++, and hopefully get
> >> > integration tests working in time for the Arrow 0.3 release.
> >> >
> >> > Thanks!
> >> > Wes
> >>
> >>
> >
> >
> > --
> > Julien
>



-- 
Julien


[jira] [Resolved] (ARROW-637) [Format] Add time zone metadata to Timestamp type

2017-03-17 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved ARROW-637.
-
Resolution: Fixed

Issue resolved by pull request 388
[https://github.com/apache/arrow/pull/388]

> [Format] Add time zone metadata to Timestamp type
> -
>
> Key: ARROW-637
> URL: https://issues.apache.org/jira/browse/ARROW-637
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Format
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>
> As a metadata-only convenience, it would be useful to have an optional Olson 
> time zone name or absolute time offset (e.g. {{+07:30}}) in the {{Timestamp}} 
> flatbuffers type: 
> https://github.com/apache/arrow/blob/master/format/Message.fbs#L94
> Null or length-0 string would indicate that the data is time zone naive, and 
> shall not be considered to be localized. 
> https://github.com/apache/arrow/pull/388



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: no arrow sync today

2017-03-16 Thread Julien Le Dem
Wednesday before the Parquet sync works.

On Thu, Mar 16, 2017 at 12:05 PM, Uwe L. Korn <uw...@xhochy.com> wrote:

> Wednesday before the Parquet sync would be ok for me, after is too late.
>
> Uwe
>
> On Thu, Mar 16, 2017, at 07:10 PM, Wes McKinney wrote:
> > I'm conflicted tomorrow. How about next Wednesday, either before or
> > after the Parquet sync?
> >
> > On Thu, Mar 16, 2017 at 10:57 AM, Julien Le Dem <jul...@dremio.com>
> > wrote:
> > > This is conflicting with Strata San Jose.
> > > I propose we do tomorrow instead.
> > > --
> > > Julien
>



-- 
Julien


[jira] [Commented] (ARROW-637) [Format] Add time zone metadata to Timestamp type

2017-03-16 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928411#comment-15928411
 ] 

Julien Le Dem commented on ARROW-637:
-

Sounds good to me with the following tweak:
 - timezone field present means data stored in that timezone (including UTC). 
Which means when displayed to the user it can be converted to the display 
timezone.
 - timezone field missing means the data is timezone-less similar to the SQL 
behavior. a Timestamp without timezone should be printed exactly the same (as 
if we had stored the string representing it) independently of the "client" 
timezone. It is stored as UTC but does not get converted to the user timezone.

> [Format] Add time zone metadata to Timestamp type
> -
>
> Key: ARROW-637
> URL: https://issues.apache.org/jira/browse/ARROW-637
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Format
>Reporter: Wes McKinney
>
> As a metadata-only convenience, it would be useful to have an optional Olson 
> time zone name in the {{Timestamp}} flatbuffers type: 
> https://github.com/apache/arrow/blob/master/format/Message.fbs#L94
> Null or length-0 string would indicate that the data has no time zone (UTC)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-316) Finalize Date type

2017-03-16 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928377#comment-15928377
 ] 

Julien Le Dem commented on ARROW-316:
-

What representations do we need?
 - millisecond since the UNIX epoch on 64 bits: does this mean the timestamp at 
midnight? This is not the same as number of days * 24 * 3600 * 1000 because of 
leap seconds.
 - number of days since the UNIX epoch on int32. This one seems straightforward.

Do we want to make the Date type more explicit similar to the current Time type 
discussion? Meaning it has a unit and a bitWidth? This seems overkill since the 
precision is always day.




> Finalize Date type
> --
>
> Key: ARROW-316
> URL: https://issues.apache.org/jira/browse/ARROW-316
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>        Reporter: Julien Le Dem
>        Assignee: Julien Le Dem
> Fix For: 0.3.0
>
>
> Parquet defines it as: "number of days from the Unix epoch, 1 January 1970."
> https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#datetime-types
> We should make it the same.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


no arrow sync today

2017-03-16 Thread Julien Le Dem
This is conflicting with Strata San Jose.
I propose we do tomorrow instead.
-- 
Julien


Arrow based data access

2017-03-15 Thread Julien Le Dem
We’re working on finalizing a few types and writing the integration tests
that go with them.

At this point we have a solid foundation in the Arrow project.

As a next step I’m going to look into adding an Arrow RPC/REST interface
dedicated to data retrieval.

We had several discussions about this and I’m going to formalize a spec and
ask for review.

This Arrow based data access interface is intended to be used by systems
that need access to data for processing (SQL engines, processing
frameworks, …) and implemented by storage layers or really anything that
can produce data (including processing frameworks return result sets for
example). That will greatly simplify integration between the many actors in
each category.

The basic premise is to be able to fetch data in Arrow format while
benefitting from the no-overhead serialization deserialization and getting
the data in columnar format.

Some obvious topics that come to mind:

- How do we identify a dataset?

- How do we specify projections?

- What about predicate push downs or in general parameters?

- What underlying protocol to use? HTTP2?

- push vs pull?

- build a reference implementation (Suggestions?)

Potential candidates for using this:

- to consume data or to expose result sets: Drill, Hive, Presto, Impala,
Spark, RecordService...
- as a server: Kudu, HBase, Cassandra, …

-- 
Julien


[jira] [Commented] (ARROW-491) [C++] Add FixedWidthBinary type

2017-03-15 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927204#comment-15927204
 ] 

Julien Le Dem commented on ARROW-491:
-

So, from the layout perspective, there's just one vector with fixed width 
values of `byteWidth` and a validityVector .
Sounds good to me.

> [C++] Add FixedWidthBinary type
> ---
>
> Key: ARROW-491
> URL: https://issues.apache.org/jira/browse/ARROW-491
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>
> This would be very useful for interacting with Parquet and other data sources 
> that use fixed-byte-width binary types, as a more efficient version of 
> List. We can resolve adding this type to the format spec later.
> [~julienledem] [~nongli] do you have an opinion about this type? Would be 
> equivalent to FIXED_LEN_BYTE_ARRAY in Parquet. It would be convenient to 
> implement Decimal types on top of this



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (ARROW-617) Time type is not specified clearly

2017-03-15 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926998#comment-15926998
 ] 

Julien Le Dem edited comment on ARROW-617 at 3/15/17 9:11 PM:
--

 [~wesmckinn] that would work but the type needs to fully specify the physical 
layout. If the unit is used to infer the value bit width (32 or 64 bits) then 
we can't change in the future. So the type would be something like this?
{noformat}
table Time {
  unit: TimeUnit;
  bitWidth: int; // restricted to 32 for sec and millis, and 64 for micros and 
nanos in v1
}
{noformat}
Possibly we can further limit to millis and nanos only in v1 since they are 
respectively the most precise for each bitWIdth (millis: 32bits, nanos: 64 bits)
This is a little verbose but explicit and future proof.


was (Author: julienledem):
 [~wesmckinn] that would work but the type needs to fully specify the physical 
layout. If the unit is used to infer the value bit width (32 or 64 bits) then 
we can't change in the future. So the type would be something like this?
```
table Time {
  unit: TimeUnit;
  bitWidth: int; // restricted to 32 for sec and millis, and 64 for micros and 
nanos in v1
}
```
Possibly we can further limit to millis and nanos only in v1 since they are 
respectively the most precise for each bitWIdth (millis: 32bits, nanos: 64 bits)
This is a little verbose but explicit and future proof.

> Time type is not specified clearly
> --
>
> Key: ARROW-617
> URL: https://issues.apache.org/jira/browse/ARROW-617
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Format
>        Reporter: Julien Le Dem
>
> 2 options:
> - Use 64 bits for microseconds and nanoseconds, 32 bits for other units
> - Use 64 bits for everything
> The latter is simpler to implement, the former saves space.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-617) Time type is not specified clearly

2017-03-15 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926998#comment-15926998
 ] 

Julien Le Dem commented on ARROW-617:
-

 [~wesmckinn] that would work but the type needs to fully specify the physical 
layout. If the unit is used to infer the value bit width (32 or 64 bits) then 
we can't change in the future. So the type would be something like this?
```
table Time {
  unit: TimeUnit;
  bitWidth: int; // restricted to 32 for sec and millis, and 64 for micros and 
nanos in v1
}
```
Possibly we can further limit to millis and nanos only in v1 since they are 
respectively the most precise for each bitWIdth (millis: 32bits, nanos: 64 bits)
This is a little verbose but explicit and future proof.

> Time type is not specified clearly
> --
>
> Key: ARROW-617
> URL: https://issues.apache.org/jira/browse/ARROW-617
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Format
>        Reporter: Julien Le Dem
>
> 2 options:
> - Use 64 bits for microseconds and nanoseconds, 32 bits for other units
> - Use 64 bits for everything
> The latter is simpler to implement, the former saves space.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-617) Time type is not specified clearly

2017-03-15 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926601#comment-15926601
 ] 

Julien Le Dem commented on ARROW-617:
-

The layout in entirely defined by the Type definition which in this instance 
would include the unit. That's what TypeLayout is for on the java side. So that 
would be fixed and clear to the user

> Time type is not specified clearly
> --
>
> Key: ARROW-617
> URL: https://issues.apache.org/jira/browse/ARROW-617
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Format
>        Reporter: Julien Le Dem
>
> 2 options:
> - Use 64 bits for microseconds and nanoseconds, 32 bits for other units
> - Use 64 bits for everything
> The latter is simpler to implement, the former saves space.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (ARROW-617) Time type is not specified clearly

2017-03-15 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926547#comment-15926547
 ] 

Julien Le Dem edited comment on ARROW-617 at 3/15/17 4:52 PM:
--

log2(1000 * 1000 * 3600 *24) = 36.3... so micros don't fit in 32 bits.
that'd be:
 - 32 bits for second and millisecond precision
 - 64 bits for microsecond and nanosecond precision
I would rather make it hard requirement and not allow 64 bits for sec/millis 
for simplicity. That way we don't have to specify 2 types (Time32 and Time64). 
We just use the smallest precision that works for Time based on the unit.


was (Author: julienledem):
log2(1000 * 1000 * 3600 *24) = 36.3... so micros don't fit in 32 bits.
that's be:
 - 32 bits for second and millisecond precision
 - 64 bits for microsecond and nanosecond precision
I would rather make it hard requirement and not allow 64 bits for sec/millis 
for simplicity. That way we don't have to specify 2 types (Time32 and Time64). 
We just use the smallest precision that works for Time based on the unit.

> Time type is not specified clearly
> --
>
> Key: ARROW-617
> URL: https://issues.apache.org/jira/browse/ARROW-617
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Format
>        Reporter: Julien Le Dem
>
> 2 options:
> - Use 64 bits for microseconds and nanoseconds, 32 bits for other units
> - Use 64 bits for everything
> The latter is simpler to implement, the former saves space.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (ARROW-617) Time type is not specified clearly

2017-03-15 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926547#comment-15926547
 ] 

Julien Le Dem edited comment on ARROW-617 at 3/15/17 4:50 PM:
--

log2(1000 * 1000 * 3600 *24) = 36.3... so micros don't fit in 32 bits.
that's be:
 - 32 bits for second and millisecond precision
 - 64 bits for microsecond and nanosecond precision
I would rather make it hard requirement and not allow 64 bits for sec/millis 
for simplicity. That way we don't have to specify 2 types (Time32 and Time64). 
We just use the smallest precision that works for Time based on the unit.


was (Author: julienledem):
```log2(1000 * 1000 * 3600 *24) = 36.3...``` so micros don't fit in 32 bits.
that's be:
 - 32 bits for second and millisecond precision
 - 64 bits for microsecond and nanosecond precision
I would rather make it hard requirement and not allow 64 bits for sec/millis 
for simplicity. That way we don't have to specify 2 types (Time32 and Time64). 
We just use the smallest precision that works for Time based on the unit.

> Time type is not specified clearly
> --
>
> Key: ARROW-617
> URL: https://issues.apache.org/jira/browse/ARROW-617
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Format
>        Reporter: Julien Le Dem
>
> 2 options:
> - Use 64 bits for microseconds and nanoseconds, 32 bits for other units
> - Use 64 bits for everything
> The latter is simpler to implement, the former saves space.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-617) Time type is not specified clearly

2017-03-15 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926547#comment-15926547
 ] 

Julien Le Dem commented on ARROW-617:
-

```log2(1000 * 1000 * 3600 *24) = 36.3...``` so micros don't fit in 32 bits.
that's be:
 - 32 bits for second and millisecond precision
 - 64 bits for microsecond and nanosecond precision
I would rather make it hard requirement and not allow 64 bits for sec/millis 
for simplicity. That way we don't have to specify 2 types (Time32 and Time64). 
We just use the smallest precision that works for Time based on the unit.

> Time type is not specified clearly
> --
>
> Key: ARROW-617
> URL: https://issues.apache.org/jira/browse/ARROW-617
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Format
>        Reporter: Julien Le Dem
>
> 2 options:
> - Use 64 bits for microseconds and nanoseconds, 32 bits for other units
> - Use 64 bits for everything
> The latter is simpler to implement, the former saves space.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (ARROW-114) Bring in java-unsafe-tools as utility library for Arrow

2017-03-13 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated ARROW-114:

Priority: Minor  (was: Major)

> Bring in java-unsafe-tools as utility library for Arrow
> ---
>
> Key: ARROW-114
> URL: https://issues.apache.org/jira/browse/ARROW-114
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Priority: Minor
>
> Originally here:
> https://github.com/alexkasko/unsafe-tools 
> SGA signed off and received by Secretary.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-617) Time type is not specified clearly

2017-03-10 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905959#comment-15905959
 ] 

Julien Le Dem commented on ARROW-617:
-

Currently the java TimeVector supports millis only on 32 bits.
the c++ Time Vector is 64bits.

> Time type is not specified clearly
> --
>
> Key: ARROW-617
> URL: https://issues.apache.org/jira/browse/ARROW-617
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Format
>        Reporter: Julien Le Dem
>
> 2 options:
> - Use 64 bits for microseconds and nanoseconds, 32 bits for other units
> - Use 64 bits for everything
> The latter is simpler to implement, the former saves space.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ARROW-617) Time type is not specified clearly

2017-03-10 Thread Julien Le Dem (JIRA)
Julien Le Dem created ARROW-617:
---

 Summary: Time type is not specified clearly
 Key: ARROW-617
 URL: https://issues.apache.org/jira/browse/ARROW-617
 Project: Apache Arrow
  Issue Type: Bug
  Components: Format
Reporter: Julien Le Dem


2 options:
- Use 64 bits for microseconds and nanoseconds, 32 bits for other units
- Use 64 bits for everything
The latter is simpler to implement, the former saves space.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Date/Time fields values in Java

2017-03-10 Thread Julien Le Dem
The first option makes it easier for compatibility with existing code
I opened a JIRA to discuss:
https://issues.apache.org/jira/browse/ARROW-617

On Fri, Mar 10, 2017 at 4:42 PM, Wes McKinney <wesmck...@gmail.com> wrote:

> There's two routes, I guess:
>
> - Use 64 bits for microseconds and nanoseconds, 32 bits for other units
> - Use 64 bits for everything
>
> The latter is simpler to implement, the former saves space. I am not
> sure which is the better solution. Another situation where this will
> occur is with decimals, where the storage type may be a function of
> the precision and scale. Thoughts?
>
> On Fri, Mar 10, 2017 at 6:18 PM, Julien Le Dem <jul...@dremio.com> wrote:
> > It sounds like we need to specify a different bit width depending on the
> > unit?
> > millisecond time fits in 32 bits but neither do micros nor nanos.
> > the java TimeVector uses 32 bit for now (and supports millis only):
> > https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
> 9ed3385a8d/java/vector/src/main/codegen/data/ValueVectorTypes.tdd#L60
> >
> >
> > On Fri, Mar 10, 2017 at 2:12 PM, Wes McKinney <wesmck...@gmail.com>
> wrote:
> >
> >> Sorry to be a little slow to respond on this.
> >>
> >> Since we support nanosecond time unit, we need to use 64 bits. So it
> >> sounds like the bug is on the Java side
> >>
> >> On Fri, Mar 10, 2017 at 4:47 PM, Bryan Cutler <cutl...@gmail.com>
> wrote:
> >> > Thanks for the info Julien.  I'll open a JIRA for fixing the type
> layout
> >> > for TIME, and I'll give the documentation a shot.
> >> >
> >> > Regards,
> >> > Bryan
> >> >
> >> > On Thu, Mar 9, 2017 at 9:01 PM, Julien Le Dem <jul...@dremio.com>
> wrote:
> >> >
> >> >> Hi Bryan,
> >> >> In the JSON representation we should use the integer representation
> of
> >> the
> >> >> Timestamp. We should not depend on joda for this.
> >> >>
> >> >> DATE is on 8 bytes => 64bits:
> >> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
> >> >> 9ed3385a8d/format/Message.fbs#L79
> >> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
> >> >> 9ed3385a8d/java/vector/src/main/codegen/data/
> ValueVectorTypes.tdd#L73
> >> >>
> >> >> Time in on 4 bytes => 32 bits and has a unit:
> >> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
> >> >> 9ed3385a8d/format/Message.fbs#L85
> >> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
> >> >> 9ed3385a8d/java/vector/src/main/codegen/data/
> ValueVectorTypes.tdd#L60
> >> >> It should the time in {unit} since midnight stored in a 32 bit
> integer.
> >> >> It should not have a default unit IMO
> >> >>
> >> >> So as you pointed out it looks like a bug both on the C++ and java
> side
> >> for
> >> >> Time
> >> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
> >> >> 9ed3385a8d/java/vector/src/main/java/org/apache/arrow/
> >> >> vector/schema/TypeLayout.java#L163
> >> >> tests TODO here:
> >> >> https://issues.apache.org/jira/browse/ARROW-510
> >> >>
> >> >> We need to add the Date, Time, Timestamp description in the doc:
> >> >> https://github.com/apache/arrow/blob/master/format/Metadata.md
> >> >> You are welcome to take a stab at it and send a Pull request if you
> feel
> >> >> like it.
> >> >> Otherwise I'll update it.
> >> >>
> >> >> On Thu, Mar 9, 2017 at 3:37 PM, Bryan Cutler <cutl...@gmail.com>
> wrote:
> >> >>
> >> >> > I guess it would make sense to just store the time of day value in
> >> >> > milliseconds to go along with the DATE type that contains days
> since
> >> >> epoch,
> >> >> > which would fit into a 4 byte value.  Only I see conflicting code
> in
> >> >> > TypeLayout.java that defines the schema as 64 bit width
> >> >> >
> >> >> > public TypeLayout visit(Time type) {
> >> >> > return newFixedWidthTypeLayout(dataVector(64));
> >> >> > }
> >> >> >
> >> >> > And in C++ there is this comment
> >> >> &

Re: Date/Time fields values in Java

2017-03-10 Thread Julien Le Dem
It sounds like we need to specify a different bit width depending on the
unit?
millisecond time fits in 32 bits but neither do micros nor nanos.
the java TimeVector uses 32 bit for now (and supports millis only):
https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b9ed3385a8d/java/vector/src/main/codegen/data/ValueVectorTypes.tdd#L60


On Fri, Mar 10, 2017 at 2:12 PM, Wes McKinney <wesmck...@gmail.com> wrote:

> Sorry to be a little slow to respond on this.
>
> Since we support nanosecond time unit, we need to use 64 bits. So it
> sounds like the bug is on the Java side
>
> On Fri, Mar 10, 2017 at 4:47 PM, Bryan Cutler <cutl...@gmail.com> wrote:
> > Thanks for the info Julien.  I'll open a JIRA for fixing the type layout
> > for TIME, and I'll give the documentation a shot.
> >
> > Regards,
> > Bryan
> >
> > On Thu, Mar 9, 2017 at 9:01 PM, Julien Le Dem <jul...@dremio.com> wrote:
> >
> >> Hi Bryan,
> >> In the JSON representation we should use the integer representation of
> the
> >> Timestamp. We should not depend on joda for this.
> >>
> >> DATE is on 8 bytes => 64bits:
> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
> >> 9ed3385a8d/format/Message.fbs#L79
> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
> >> 9ed3385a8d/java/vector/src/main/codegen/data/ValueVectorTypes.tdd#L73
> >>
> >> Time in on 4 bytes => 32 bits and has a unit:
> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
> >> 9ed3385a8d/format/Message.fbs#L85
> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
> >> 9ed3385a8d/java/vector/src/main/codegen/data/ValueVectorTypes.tdd#L60
> >> It should the time in {unit} since midnight stored in a 32 bit integer.
> >> It should not have a default unit IMO
> >>
> >> So as you pointed out it looks like a bug both on the C++ and java side
> for
> >> Time
> >> https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b
> >> 9ed3385a8d/java/vector/src/main/java/org/apache/arrow/
> >> vector/schema/TypeLayout.java#L163
> >> tests TODO here:
> >> https://issues.apache.org/jira/browse/ARROW-510
> >>
> >> We need to add the Date, Time, Timestamp description in the doc:
> >> https://github.com/apache/arrow/blob/master/format/Metadata.md
> >> You are welcome to take a stab at it and send a Pull request if you feel
> >> like it.
> >> Otherwise I'll update it.
> >>
> >> On Thu, Mar 9, 2017 at 3:37 PM, Bryan Cutler <cutl...@gmail.com> wrote:
> >>
> >> > I guess it would make sense to just store the time of day value in
> >> > milliseconds to go along with the DATE type that contains days since
> >> epoch,
> >> > which would fit into a 4 byte value.  Only I see conflicting code in
> >> > TypeLayout.java that defines the schema as 64 bit width
> >> >
> >> > public TypeLayout visit(Time type) {
> >> > return newFixedWidthTypeLayout(dataVector(64));
> >> > }
> >> >
> >> > And in C++ there is this comment
> >> >   // Exact time encoded with int64, default unit millisecond
> >> >   TIME,
> >> >
> >> > Does the TIME type still need to go through some discussion to get
> pinned
> >> > down?
> >> >
> >> > Thanks,
> >> > Bryan
> >> >
> >> > On Thu, Mar 9, 2017 at 10:53 AM, Bryan Cutler <cutl...@gmail.com>
> wrote:
> >> >
> >> > > Hello All,
> >> > >
> >> > > I've started work on ARROW-582 to add Date/Time support for Java
> JSON
> >> > > files and would just like to clear up a few things.  I believe the
> Java
> >> > > Time type is supposed to represent milliseconds since epoch, it is
> >> stored
> >> > > as a FixedValueVector with a width of 4 bytes (equivalent to Java
> >> 'int')
> >> > > and it retrieved by constructing a org.joda.time.DateTime with that
> >> > value.
> >> > > Shouldn't this be an 8 byte width, equivalent to Java 'long'?
> >> > >
> >> > > <#elseif minor.class == "Time">
> >> > > @Override
> >> > > public DateTime getObject(int index) {
> >> > >
> >> > > org.joda.time.DateTime time = new
> org.joda.time.DateTime(get(
> >> > index),
> >> > > org.joda.time.DateTimeZone.UTC);
> >> > > time = time.withZoneRetainFields(org.
> joda.time.DateTimeZone.
> >> > > getDefault());
> >> > > return time;
> >> > > }
> >> > >
> >> > > Thanks,
> >> > > Bryan
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Julien
> >>
>



-- 
Julien


Re: Date/Time fields values in Java

2017-03-09 Thread Julien Le Dem
Hi Bryan,
In the JSON representation we should use the integer representation of the
Timestamp. We should not depend on joda for this.

DATE is on 8 bytes => 64bits:
https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b9ed3385a8d/format/Message.fbs#L79
https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b9ed3385a8d/java/vector/src/main/codegen/data/ValueVectorTypes.tdd#L73

Time in on 4 bytes => 32 bits and has a unit:
https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b9ed3385a8d/format/Message.fbs#L85
https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b9ed3385a8d/java/vector/src/main/codegen/data/ValueVectorTypes.tdd#L60
It should the time in {unit} since midnight stored in a 32 bit integer.
It should not have a default unit IMO

So as you pointed out it looks like a bug both on the C++ and java side for
Time
https://github.com/apache/arrow/blob/6b3ae2aecc8cd31425035a021fa04b9ed3385a8d/java/vector/src/main/java/org/apache/arrow/vector/schema/TypeLayout.java#L163
tests TODO here:
https://issues.apache.org/jira/browse/ARROW-510

We need to add the Date, Time, Timestamp description in the doc:
https://github.com/apache/arrow/blob/master/format/Metadata.md
You are welcome to take a stab at it and send a Pull request if you feel
like it.
Otherwise I'll update it.

On Thu, Mar 9, 2017 at 3:37 PM, Bryan Cutler  wrote:

> I guess it would make sense to just store the time of day value in
> milliseconds to go along with the DATE type that contains days since epoch,
> which would fit into a 4 byte value.  Only I see conflicting code in
> TypeLayout.java that defines the schema as 64 bit width
>
> public TypeLayout visit(Time type) {
> return newFixedWidthTypeLayout(dataVector(64));
> }
>
> And in C++ there is this comment
>   // Exact time encoded with int64, default unit millisecond
>   TIME,
>
> Does the TIME type still need to go through some discussion to get pinned
> down?
>
> Thanks,
> Bryan
>
> On Thu, Mar 9, 2017 at 10:53 AM, Bryan Cutler  wrote:
>
> > Hello All,
> >
> > I've started work on ARROW-582 to add Date/Time support for Java JSON
> > files and would just like to clear up a few things.  I believe the Java
> > Time type is supposed to represent milliseconds since epoch, it is stored
> > as a FixedValueVector with a width of 4 bytes (equivalent to Java 'int')
> > and it retrieved by constructing a org.joda.time.DateTime with that
> value.
> > Shouldn't this be an 8 byte width, equivalent to Java 'long'?
> >
> > <#elseif minor.class == "Time">
> > @Override
> > public DateTime getObject(int index) {
> >
> > org.joda.time.DateTime time = new org.joda.time.DateTime(get(
> index),
> > org.joda.time.DateTimeZone.UTC);
> > time = time.withZoneRetainFields(org.joda.time.DateTimeZone.
> > getDefault());
> > return time;
> > }
> >
> > Thanks,
> > Bryan
> >
>



-- 
Julien


Re: Arrow sync in 10min

2017-03-09 Thread Julien Le Dem
Emilio,
Let me know if you need help.

On Thu, Mar 2, 2017 at 12:13 PM, Emilio Lahr-Vivaz <elahrvi...@ccri.com>
wrote:

> Oops, sorry I mean to attend but forgot. Re: dictionary encoding support,
> I haven't had time to work on it recently, but I should have some next
> week. I've partially coded some of the changes but it's not in a working
> state to push at the moment.
>
> Thanks,
>
> Emilio
>
>
> On 03/02/2017 02:28 PM, Julien Le Dem wrote:
>
>> Notes:
>>
>> Attendees:
>> - Julien (Dremio)
>> - Wes (TwoSigma)
>> - Uwe (BlueYonder) excused: Python packaging, Arrow ODBC dependent on
>> arrow-0.3
>>
>>   Date and time support in C++ IPC:
>>- need java implementation for integration tests.
>>- arrow Json representation for date/time should be integer.
>>- Miki Tebeka is working on this.
>>
>> Dictionary encoding support:
>>- TODO: follow up on https://github.com/apache/arrow/pull/334
>>
>> New Fixed width binary type:
>>   - type would contain byteWidth
>>   - physical implementation similar to varbinary with no offset vector
>>   - Wes to propose a PR
>>
>> New TensorType message:
>>   - Defined by:
>> - buffer: offset, length
>> - buffer layout
>>   - dimensions: list of longs
>>   - order (row major or column major)
>>   - type: fixed size type from the type enum.
>>   - Wes to propose a PR
>>
>> Spark-Arrow integration:
>>   - contribution now depending on arrow-0.2
>>https://issues.apache.org/jira/browse/SPARK-13534
>>https://github.com/apache/spark/pull/15821
>>   - ready for review
>>
>> On Thu, Mar 2, 2017 at 9:52 AM, Julien Le Dem <jul...@dremio.com> wrote:
>>
>> at 10 am PT on google hangout:
>>> https://hangouts.google.com/hangouts/_/dremio.com/arrow
>>>
>>> --
>>> Julien
>>>
>>>
>>
>>
>


-- 
Julien


Re: Arrow sync in 10min

2017-03-02 Thread Julien Le Dem
Notes:

Attendees:
- Julien (Dremio)
- Wes (TwoSigma)
- Uwe (BlueYonder) excused: Python packaging, Arrow ODBC dependent on
arrow-0.3

 Date and time support in C++ IPC:
  - need java implementation for integration tests.
  - arrow Json representation for date/time should be integer.
  - Miki Tebeka is working on this.

Dictionary encoding support:
  - TODO: follow up on https://github.com/apache/arrow/pull/334

New Fixed width binary type:
 - type would contain byteWidth
 - physical implementation similar to varbinary with no offset vector
 - Wes to propose a PR

New TensorType message:
 - Defined by:
   - buffer: offset, length
   - buffer layout
 - dimensions: list of longs
 - order (row major or column major)
 - type: fixed size type from the type enum.
 - Wes to propose a PR

Spark-Arrow integration:
 - contribution now depending on arrow-0.2
  https://issues.apache.org/jira/browse/SPARK-13534
  https://github.com/apache/spark/pull/15821
 - ready for review

On Thu, Mar 2, 2017 at 9:52 AM, Julien Le Dem <jul...@dremio.com> wrote:

> at 10 am PT on google hangout:
> https://hangouts.google.com/hangouts/_/dremio.com/arrow
>
> --
> Julien
>



-- 
Julien


Arrow sync in 10min

2017-03-02 Thread Julien Le Dem
at 10 am PT on google hangout:
https://hangouts.google.com/hangouts/_/dremio.com/arrow

-- 
Julien


Day of Sync-up

2017-02-24 Thread Julien Le Dem
Currently the Parquet sync-up is scheduled on Thursday 10 am PT every other
week.
Marcel mentioned that another day (same time) would be more convenient.
What works for the usual attendees?

-- 
Julien


[jira] [Commented] (ARROW-581) Transportation

2017-02-24 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883567#comment-15883567
 ] 

Julien Le Dem commented on ARROW-581:
-

[~mfalhi] Please add a description.

> Transportation
> --
>
> Key: ARROW-581
> URL: https://issues.apache.org/jira/browse/ARROW-581
> Project: Apache Arrow
>  Issue Type: Test
>Reporter: MFALHI MOHAMMED
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [ANNOUNCE] Apache Arrow 0.2.0 released

2017-02-21 Thread Julien Le Dem
And java artifacts have been pushed to maven central:
http://repo1.maven.org/maven2/org/apache/arrow/arrow-vector/0.2.0/

On Sun, Feb 19, 2017 at 8:46 AM, Wes McKinney  wrote:

> The Apache Arrow community is pleased to announce the 0.2.0
> release. It includes 192 resolved issues ([1]) since the first
> ASF release on October 7, 2016.
>
> The released source artifacts are located at [2]. Maven, conda,
> and other artifacts will be published in the near future.
>
> What is Apache Arrow?
> -
>
> Apache Arrow is a columnar in-memory analytics layer designed to
> accelerate big
> data. It houses a set of canonical in-memory representations of flat and
> hierarchical data along with multiple language-bindings for structure
> manipulation. It also provides low-overhead streaming and batch messaging,
> zero-copy interprocess communication (IPC), and common algorithm
> implementations.
>
> Release Highlights
> --
>
> This release is a major milestone for the project, as we now have
> integration tests validating binary compatibility between the
> Java and C++ (and Python) implementations. These tests are now
> being run continuously in Travis CI.
>
> Other highlights include:
>
> - A new streaming binary format (with Java and C++/Python implementations)
> - Prototype for dictionary-encoded data in memory
> - Significantly expanded Python functionality, particularly pandas and
> Apache
>   Parquet interoperability
> - A JSON file "format" for specifying integration tests
> - Expanded zero-copy or low-overhead threadsafe IO for C++
> - Build and packaging improvements
>
> Please report any feedback to the mailing lists ([3])
>
> Regards,
> The Apache Arrow community
>
> [1]: https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20ARROW%20AND%20fixVersion%20%3D%200.2.0%20ORDER%20BY%20priority%20DESC
> [2]: https://dist.apache.org/repos/dist/release/arrow/
> [3]: https://lists.apache.org/list.html?dev@arrow.apache.org
>



-- 
Julien


Re: dist access for release

2017-02-21 Thread Julien Le Dem
Julian,
Thank you for pointing at the reference. I see in [2] that:
"The PMC can also vote to let non-PMC-members update the dist/release area.
To get this set up, please open a JIRA ticket at the INFRA JIRA
<https://issues.apache.org/jira/browse/INFRA> referencing the PMC vote."
So the question is, do we want to do such a vote or do we stick with "a PMC
member will do it"?

On Fri, Feb 17, 2017 at 7:39 PM, Wes McKinney <wesmck...@gmail.com> wrote:

> I'm available tomorrow to help with publishing the release artifacts.
>
> On Fri, Feb 17, 2017 at 9:10 PM Julian Hyde <jh...@apache.org> wrote:
>
> > No. My understanding is that only PMC members are authorized by ASF to
> > make a release. Committers can be release managers.
> >
> > "All release artifacts within the directory MUST be signed by a
> committer,
> > preferably a PMC member.” [1]
> >
> > "If the Release Manager is not a member of the PMC, they will need to ask
> > a PMC member to do the actual release publication.” [2]
> >
> > Julian
> >
> > [1] http://www.apache.org/legal/release-policy.html#release-distribution
> >
> > [2] http://www.apache.org/legal/release-policy.html#upload-ci
> >
> >
> > > On Feb 17, 2017, at 5:36 PM, Julien Le Dem <jul...@dremio.com> wrote:
> > >
> > > Here is the answer from infra regarding release repo access:
> > > https://issues.apache.org/jira/browse/INFRA-13530
> > > "So the 'default' is that PMC members have access to the dist
> > dev/release
> > > area of a project.
> > > The PMC can request that that setting be lifted to include project
> > > committers.
> > > (We wont grant PMC + named individuals; its either PMC or Committers.)
> > >
> > > Let me know what you decide."
> > >
> > > I think it is fine to give access to committers in general and not just
> > PMC
> > > so that committers can manage the release.
> > > It doesn't change that we need a release vote to publish anything
> there.
> > >
> > > agreed?
> > >
> > > --
> > > Julien
> >
>



-- 
Julien


dist access for release

2017-02-17 Thread Julien Le Dem
Here is the answer from infra regarding release repo access:
https://issues.apache.org/jira/browse/INFRA-13530
"So the 'default' is that PMC members have access to the dist  dev/release
area of a project.
The PMC can request that that setting be lifted to include project
committers.
(We wont grant PMC + named individuals; its either PMC or Committers.)

Let me know what you decide."

I think it is fine to give access to committers in general and not just PMC
so that committers can manage the release.
It doesn't change that we need a release vote to publish anything there.

agreed?

-- 
Julien


Re: Arrow sync starting now

2017-02-16 Thread Julien Le Dem
notes:

Attendees
- Khaled (PhD student Waterloo): distributed graph processing. Use arrow
for distributed graph database in C++.
Interested in Kudu, Arrow.
- Wes (Two Sigma): arrow website update, Tensor type
- Uwe (Blue Yonder): push release, update arrow website
- Julien (Dremio): arrow roadmap

Arrow website
- Arrow website is updated automatically from svn.
- Uwe: look at the same setup as Calcite. site source in git. tool to push
to svn

Arrow road map
- Need to write down a road map
 - arrow UDF interface (spark/python java)
 - arrow storage interface (kudu…)
 - Uwe: arrow ODBC connector (on top of arrow). See:
https://github.com/blue-yonder/turbodbc

- Tensor type: https://issues.apache.org/jira/browse/ARROW-550
  - n-dimensional metadata with direction (fortran-order: column major,
c-order: rows contiguous. dimensions change in reverse order)
Needed for Ion Stoica’s Ray project: https://github.com/ray-project/ray

- Wes: possibly move python code under cpp folder to facilitate
development.

- Arrow 0.3 goals:
 - complete dictionary IPC
 - tensor types: initially no support for large buffers in java
 - Uwe: dictionary in arrow <- parquet.
 - Julien: java vectors bug fixes.


On Thu, Feb 16, 2017 at 10:04 AM, Julien Le Dem <jul...@dremio.com> wrote:

> https://hangouts.google.com/hangouts/_/dremio.com/arrow
>
> --
> Julien
>



-- 
Julien


Arrow sync starting now

2017-02-16 Thread Julien Le Dem
https://hangouts.google.com/hangouts/_/dremio.com/arrow

-- 
Julien


Re: [VOTE] Release Apache Arrow 0.2.0 - rc2

2017-02-15 Thread Julien Le Dem
+1 (binding)
verified signature.
ran java and c++ tests.

I had to make a new key for the last release.
It's not been signed yet but is published here:
https://people.apache.org/keys/committer/julien


On Wed, Feb 15, 2017 at 1:08 PM, P. Taylor Goetz  wrote:

> +1 (binding)
>
> * signature and checksums valid
> * built java and ran tests
> * built cpp and ran tests
> * checked NOTICE and LICENSE files
> * checked for license headers in source files
> * confirmed no binaries in source archive
>
> One very minor nit: Uwe’s signing key isn’t in the Apache web of trust. It
> is signed by Julien, but his key isn’t in the WoT either. Perhaps we can
> set aside some time in the syncup to exchange keys. I usually have a
> conflict on Thursdays, but I’ll see if I can shuffle things and attend
> tomorrow.
>
> -Taylor
>
> > On Feb 15, 2017, at 11:19 AM, Uwe L. Korn  wrote:
> >
> > Hello all,
> >
> > I'd like to propose the second release candidate (rc2) of Apache Arrow
> > version 0.2.0.
> > It covers a total of 192 resolved JIRAs [1] Thanks to everyone who
> > contributed to this release.
> >
> > The source release rc2 is hosted at [2].
> > This release candidate is based on commit
> > fa8d27f314b7c21c611d1c5caaa9b32ae0cb2b06 located at [3].
> >
> > The vote will be open for the next ~72 hours ending at 17:30 CET,
> > February 18, 2017.
> >
> > [ ] +1
> > [ ] +0
> > [ ] -1
> >
> > I have run the java build + tests
> > I have run cpp build + tests
> > I have built the Python manylinux1 package + run the tests
> >
> > Here's my vote: +1 (non-binding)
> >
> > Thanks,
> > Uwe
> >
> > How to validate a release signature:
> > https://httpd.apache.org/dev/verification.html
> >
> > [1]
> > https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20ARROW%20AND%20fixVersion%20%3D%200.2.0%20ORDER%20BY%20priority%20DESC
> > [2] https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.2.0-rc2/
> > [3]
> > https://github.com/apache/arrow/tree/fa8d27f314b7c21c611d1c5caaa9b3
> 2ae0cb2b06
>
>


-- 
Julien


[jira] [Commented] (ARROW-186) [Java] Make sure alignment and memory padding conform to spec

2017-02-15 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868160#comment-15868160
 ] 

Julien Le Dem commented on ARROW-186:
-

yes agreed, this is not a blocker.

> [Java] Make sure alignment and memory padding conform to spec
> -
>
> Key: ARROW-186
> URL: https://issues.apache.org/jira/browse/ARROW-186
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java - Memory
>Reporter: Micah Kornfield
>
> Per spec 64 byte alignment and padding for buffers.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-561) Update java & python dependencies to improve downstream packaging experience

2017-02-14 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866958#comment-15866958
 ] 

Julien Le Dem commented on ARROW-561:
-

Thanks [~holdenk]!

> Update java & python dependencies to improve downstream packaging experience
> 
>
> Key: ARROW-561
> URL: https://issues.apache.org/jira/browse/ARROW-561
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java - Vectors, Python
>Affects Versions: 0.1.0, 0.2.0
>Reporter: holdenk
>Assignee: holdenk
>Priority: Minor
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> The current build for arrow uses a interesting work around for hamcrest 
> conflict between JUNIT and mockito which results in mockito being in the 
> compile scope. This is not suitable for some downstream users.
> Python setup file also leaves out test dependency (not overly important but 
> useful for developers) & we can clarify parquet-cpp as an "extra" dependency 
> for people requiring parquet support (already mentioned in the README file 
> but good to have clarity in setup.py as well).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (ARROW-561) Update java & python dependencies to improve downstream packaging experience

2017-02-14 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem reassigned ARROW-561:
---

Assignee: holdenk

> Update java & python dependencies to improve downstream packaging experience
> 
>
> Key: ARROW-561
> URL: https://issues.apache.org/jira/browse/ARROW-561
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java - Vectors, Python
>Affects Versions: 0.1.0, 0.2.0
>Reporter: holdenk
>Assignee: holdenk
>Priority: Minor
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> The current build for arrow uses a interesting work around for hamcrest 
> conflict between JUNIT and mockito which results in mockito being in the 
> compile scope. This is not suitable for some downstream users.
> Python setup file also leaves out test dependency (not overly important but 
> useful for developers) & we can clarify parquet-cpp as an "extra" dependency 
> for people requiring parquet support (already mentioned in the README file 
> but good to have clarity in setup.py as well).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] Release Apache Arrow 0.2.0 - rc1

2017-02-14 Thread Julien Le Dem
@Holden: And marked my jira as duplicate since you already opened one.
Thanks! https://issues.apache.org/jira/browse/ARROW-561

On Tue, Feb 14, 2017 at 3:47 PM, Julien Le Dem <jul...@dremio.com> wrote:

> @Uwe, I think you can rebase master on top of your release branch. In the
> future I'd recommend not pushing to master while we do  a release vote.
> @Holden, agreed. This seems like an oversight I don't see why we should
> have mockito in compile scope.
> I don't think it is a release blocker though. (let us know if you think
> otherwise)
> I opened a bug: https://issues.apache.org/jira/browse/ARROW-562
>
>
> On Tue, Feb 14, 2017 at 2:25 PM, Holden Karau <hol...@pigscanfly.ca>
> wrote:
>
>> -0 (non-binding): mockito in compile scope may make this difficult for
>> downstream users to package in their projects.
>>
>> On Tue, Feb 14, 2017 at 2:10 PM, Uwe L. Korn <uw...@xhochy.com> wrote:
>>
>> > I think the problem is that there are two commits in the release branch
>> > (which I have not pushed to the asf git yet) that were auto generated by
>> > maven that update the version numbers. I thought that those should be
>> > pushed once the vote passes. But we sadly did already push other
>> commits on
>> > master so we're in a diverging state here.
>> >
>> > > Am 14.02.2017 um 22:24 schrieb Wes McKinney <wesmck...@gmail.com>:
>> > >
>> > > Julien -- it looks like we may need to update some of the Java POM
>> > > metadata. Can you confirm?
>> > >
>> > >> On Tue, Feb 14, 2017 at 2:40 PM, Julien Le Dem <jul...@dremio.com>
>> > wrote:
>> > >> +1 (binding)
>> > >>
>> > >>> On Tue, Feb 14, 2017 at 10:10 AM, Bryan Cutler <cutl...@gmail.com>
>> > wrote:
>> > >>>
>> > >>> I have
>> > >>> * Built C++ and run unit tests
>> > >>> * Built Python and run unit tests
>> > >>> * Built Java and run unit tests
>> > >>> * Run integration tests
>> > >>> * Tested with Spark integration from SPARK-13534
>> > >>>
>> > >>> My vote: +1 (non-binding)
>> > >>>
>> > >>> Thanks!
>> > >>>
>> > >>>> On Tue, Feb 14, 2017 at 5:31 AM, Wes McKinney <wesmck...@gmail.com
>> >
>> > wrote:
>> > >>>>
>> > >>>> hi Uwe,
>> > >>>>
>> > >>>> I committed the patch to git and committed the new KEYS file to
>> > >>>> https://dist.apache.org/repos/dist/release/arrow
>> > >>>>
>> > >>>> - Wes
>> > >>>>
>> > >>>>> On Tue, Feb 14, 2017 at 2:30 AM, Uwe L. Korn <uw...@xhochy.com>
>> > wrote:
>> > >>>>> Uploaded my fingerprint to https://people.apache.org/
>> > >>> keys/committer/uwe,
>> > >>>>> awaiting the daily sync. Also I made a PR to add my key to the
>> KEYS
>> > >>>>> file: https://issues.apache.org/jira/browse/ARROW-558 I'll need a
>> > PMC
>> > >>> to
>> > >>>>> merge and update SVN.
>> > >>>>>
>> > >>>>> --
>> > >>>>>  Uwe L. Korn
>> > >>>>>  uw...@xhochy.com
>> > >>>>>
>> > >>>>>> On Mon, Feb 13, 2017, at 11:08 PM, Julien Le Dem wrote:
>> > >>>>>> Did you publish your public key?
>> > >>>>>> also consider adding it here: http://www.apache.org/dist/arr
>> ow/KEYS
>> > >>>>>> and here: https://people.apache.org/keys/committer/uwe
>> > >>>>>>
>> > >>>>>> $ gpg --verify apache-arrow-0.2.0.tar.gz.asc
>> > >>>>>> gpg: assuming signed data in `apache-arrow-0.2.0.tar.gz'
>> > >>>>>> gpg: Signature made Mon Feb 13 08:40:38 2017 PST using RSA key ID
>> > >>>>>> 8CAAD602
>> > >>>>>> gpg: Can't check signature: public key not found
>> > >>>>>>
>> > >>>>>> On Mon, Feb 13, 2017 at 9:25 AM, Uwe L. Korn <uw...@xhochy.com>
>> > >>> wrote:
>> > >>>>>>
>> > >>>>>>> Hello all,
>> > >>>>>>>

[jira] [Created] (ARROW-562) Mockito should be in test scope

2017-02-14 Thread Julien Le Dem (JIRA)
Julien Le Dem created ARROW-562:
---

 Summary: Mockito should be in test scope
 Key: ARROW-562
 URL: https://issues.apache.org/jira/browse/ARROW-562
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java - Vectors
Affects Versions: 0.2.0
Reporter: Julien Le Dem


https://github.com/apache/arrow/blob/e3c167bd101734f92c3a2be2eb7f56f1fba91e67/java/pom.xml#L446



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] Release Apache Arrow 0.2.0 - rc1

2017-02-13 Thread Julien Le Dem
otherwise this looks good to me.
I ran the test on both the java and the cpp build.
Will wait until I can verify the signature to vote.

On Mon, Feb 13, 2017 at 2:08 PM, Julien Le Dem <jul...@dremio.com> wrote:

> Did you publish your public key?
> also consider adding it here: http://www.apache.org/dist/arrow/KEYS
> and here: https://people.apache.org/keys/committer/uwe
>
> $ gpg --verify apache-arrow-0.2.0.tar.gz.asc
> gpg: assuming signed data in `apache-arrow-0.2.0.tar.gz'
> gpg: Signature made Mon Feb 13 08:40:38 2017 PST using RSA key ID 8CAAD602
> gpg: Can't check signature: public key not found
>
> On Mon, Feb 13, 2017 at 9:25 AM, Uwe L. Korn <uw...@xhochy.com> wrote:
>
>> Hello all,
>>
>> I'd like to propose the first release candidate (rc1) of Apache Arrow
>> version 0.2.0.
>> It covers a total of 188 resolved JIRAs [1] Thanks to everyone who
>> contributed to this release.
>>
>> The source release rc1 is hosted at [2].
>> This release candidate is based on commit
>> 66f650cd359e13f3d5c3d4ef78d89f389d6bcecc located at [3].
>>
>> The vote will be open for the next ~72 hours ending at 19:00 CET,
>> February 16, 2017.
>>
>> [ ] +1
>> [ ] +0
>> [ ] -1
>>
>> I have run the java build + tests
>> I have run cpp build + tests
>> I have built the Python manylinux1 package + run the tests
>>
>> Here's my vote: +1 (non-binding)
>>
>> Thanks,
>> Uwe
>>
>> How to validate a release signature:
>> https://httpd.apache.org/dev/verification.html
>>
>> [1]
>> https://issues.apache.org/jira/issues/?jql=project%20%3D%
>> 20ARROW%20AND%20fixVersion%20%3D%200.2.0%20ORDER%20BY%20priority%20DESC
>> [2] https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.2.0-rc1/
>> [3]
>> https://github.com/apache/arrow/commit/66f650cd359e13f3d5c3d
>> 4ef78d89f389d6bcecc
>>
>>
>
>
> --
> Julien
>



-- 
Julien


Re: Arrow to/from Parquet in Java

2017-02-13 Thread Julien Le Dem
Hi Nikola,
The Parquet to Arrow reader should live in the Parquet repo here:
https://github.com/apache/parquet-mr/tree/master/parquet-arrow
For now it just has schema conversion code between Parquet and Arrow.
I've been working on a Java Parquet to Arrow reader.
What is your use case?
Did you have types of specific schemas in mind (flat/nested)?


On Mon, Feb 13, 2017 at 11:11 AM, Nikola Zezelj 
wrote:

> Thanks Wes,
>
> Could I potentially do the same from Java (either through JNI or JNA)?
> Alternatively, could I use hadoop's ParquetWriter to accomplish this task?
> Performance is definitely a concern so I would appreciate any input as to
> how these two approaches would compare (in case they are feasible).
>
> Thanks again,
> Nikola
>
> -Original Message-
> From: Wes McKinney [mailto:wesmck...@gmail.com]
> Sent: Sunday, February 12, 2017 2:25 PM
> To: dev@arrow.apache.org
> Subject: Re: Arrow to/from Parquet in Java
>
> hi Nikola,
>
> I believe Julien started working on this, but I'm not sure what stage of
> development it's in.
>
> We've been building the Arrow/Parquet bridge in parquet-cpp, and it's
> working very well (e.g.
> http://wesmckinney.com/blog/python-parquet-multithreading/) -- the nested
> data implementation is not yet completed, though.
>
> - Wes
>
> On Sat, Feb 11, 2017 at 7:34 PM, Nikola Zezelj 
> wrote:
> > Hi,
> >
> > I am trying to convert between Arrow and Parquet formats in Java.  How
> do I go about doing it?
> > Any help would be greatly appreciated. Thanks!
> >
> > --
> > Nikola Žeželj
> >
> >
> > This message is for the intended recipient(s) only and subject to
> > terms and conditions available at
> > www.seaportglobal.com/pages/disclaimer
> >
> > Additional important disclosures:
> > www.seaportglobal.com/pages/disclosures
>
> This message is for the intended recipient(s) only and subject to terms
> and conditions available at www.seaportglobal.com/pages/disclaimer
>
> Additional important disclosures: www.seaportglobal.com/pages/disclosures
>



-- 
Julien


Re: [VOTE] Release Apache Arrow 0.2.0 - rc1

2017-02-13 Thread Julien Le Dem
Did you publish your public key?
also consider adding it here: http://www.apache.org/dist/arrow/KEYS
and here: https://people.apache.org/keys/committer/uwe

$ gpg --verify apache-arrow-0.2.0.tar.gz.asc
gpg: assuming signed data in `apache-arrow-0.2.0.tar.gz'
gpg: Signature made Mon Feb 13 08:40:38 2017 PST using RSA key ID 8CAAD602
gpg: Can't check signature: public key not found

On Mon, Feb 13, 2017 at 9:25 AM, Uwe L. Korn  wrote:

> Hello all,
>
> I'd like to propose the first release candidate (rc1) of Apache Arrow
> version 0.2.0.
> It covers a total of 188 resolved JIRAs [1] Thanks to everyone who
> contributed to this release.
>
> The source release rc1 is hosted at [2].
> This release candidate is based on commit
> 66f650cd359e13f3d5c3d4ef78d89f389d6bcecc located at [3].
>
> The vote will be open for the next ~72 hours ending at 19:00 CET,
> February 16, 2017.
>
> [ ] +1
> [ ] +0
> [ ] -1
>
> I have run the java build + tests
> I have run cpp build + tests
> I have built the Python manylinux1 package + run the tests
>
> Here's my vote: +1 (non-binding)
>
> Thanks,
> Uwe
>
> How to validate a release signature:
> https://httpd.apache.org/dev/verification.html
>
> [1]
> https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20ARROW%20AND%20fixVersion%20%3D%200.2.0%20ORDER%20BY%20priority%20DESC
> [2] https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.2.0-rc1/
> [3]
> https://github.com/apache/arrow/commit/66f650cd359e13f3d5c3d4ef78d89f
> 389d6bcecc
>
>


-- 
Julien


Re: [Java] VectorLoader for multiple ArrowRecordBatches

2017-02-08 Thread Julien Le Dem
Yes, each RecordBatch overwrite the previous.
The idea is:
 - you load a batch
 - you process it (probably writing the output to other vectors)
 - you load the next one in the same vectors.
So you iterate on the data, one RecordBatch at a time, limiting the amount
of memory you use.
The order of batches should stay the same.
The validator should deal with multiple record batches.
You can probably just read the same number of rows as the row batch at a
time from your json and compare?

On Wed, Feb 8, 2017 at 3:47 PM, Bryan Cutler  wrote:

> Hi All,
>
> I'm currently working on SPARK-13534 and trying to validate converted data
> for testing purposes.  The data can be broken up into multiple
> ArrowRecordBatches that each have a number of rows (same columns) and I
> need to concat these, and compare with a JSON file by calling
> Validator.compareVectorSchemaRoot.  On repeated calls to
> VectorLoader.load,
> each record batch seems to overwrite the previous, but maybe I'm missing
> something.  Is this possible to do on the Java side of Arrow?  It could
> happen that the order of batches gets mixed up, so maybe this is not a good
> way to validate anyway.
>
> Thanks,
> Bryan
>



-- 
Julien


[jira] [Updated] (ARROW-352) Interval(DAY_TIME) has no unit

2017-02-08 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated ARROW-352:

Summary: Interval(DAY_TIME) has no unit  (was: Interval(Date_Time) has no 
unit)

> Interval(DAY_TIME) has no unit
> --
>
> Key: ARROW-352
> URL: https://issues.apache.org/jira/browse/ARROW-352
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Format
>        Reporter: Julien Le Dem
>
> Interval(DATE_TIME) assumes milliseconds.
> we should have a time unit like timestamp.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-351) Time type has no unit

2017-02-07 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857105#comment-15857105
 ] 

Julien Le Dem commented on ARROW-351:
-

Not a blocker IMO.
Here's a PR for it.

> Time type has no unit
> -
>
> Key: ARROW-351
> URL: https://issues.apache.org/jira/browse/ARROW-351
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Format
>Affects Versions: 0.1.0
>        Reporter: Julien Le Dem
>
> The Time type should have a time unit field.
> Right now we assume millis.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Next Arrow sync

2017-02-02 Thread Julien Le Dem
Ajay, Kirils: sorry, I closed the window before copying your emails. Could
you send them again to me?
Thank you.

On Thu, Feb 2, 2017 at 10:47 AM, Julien Le Dem <jul...@dremio.com> wrote:

> Notes:
>
> Attendance:
> - Ajay: (USA ET). here to listen and learn. Has been using storage formats
> at work.
> - Kirils: (Europe) memory alignment in Arrow. corresponding PR for Netty.
> - Uwe: (Europe) ready to make a 0.2 release in the next 2 weeks
> - Wes: (USA ET) 2sigma in NY. Working on C++/Python components. ready for
> 0.2 as well. Worked with Nong on the streaming formats with integration
> tests. with Uwe on Arrow-Parquet integration. Multi-threaded parquet reads
> etc. thread safe work. Spark-13534: convert from Spark datasets to arrow
> (file based) => spark summit Boston. Great speedups. Need to ship a release
> to get it merged.
> - Julien: (USA PT) Dremio in CA. discussed streaming with Nong, release
> 0.2
>
> - Memory alignment (ARROW-186, PR#98):
>- Sometimes allocates too much memory.
>- Netty PR: https://github.com/netty/netty/pull/6293
>- need to find out when the next netty release comes out.
>- optional for 0.2 arrow release
> - 0.2 release (ARROW-353):
>- see blocker on that jira
>- Spark-13534 depends on an Arrow release
>- some code cleanup JIRAs
>- integration test for binary data
>- other units for timestamps in java.
>- (optionally) c++: api for slicing arrays with 0 copy: adding an
> offset member in the array
>- jemalloc for memory
>- Julien to create a lira for some java api improvements.
>- goal: close or move over JIRAS by end of next week. Friday 2/10 and
> make the release
>- Uwe: release manager for 0.2 (will be the first release in pip python
> package manager).
> - 0.3
>- integration tests for timestamps
>
>
>
>
>
> On Thu, Feb 2, 2017 at 10:00 AM, Julien Le Dem <jul...@dremio.com> wrote:
>
>> The arrow sync is starting now:
>> https://plus.google.com/hangouts/_/dremio.com/arrow
>>
>> On Thu, Feb 2, 2017 at 8:38 AM, Julien Le Dem <jul...@dremio.com> wrote:
>>
>>> (I just sent this to the Parquet list but this applies to Arrow as well)
>>> Everybody interested is welcome.
>>> If there is more than one of you in the same location I'd recommend
>>> sharing the connection.
>>> The sync is every other week, lasts one hour and goes as follows:
>>>  - go around the "table" for everyone to quickly introduce themselves
>>> and state the agenda items they'd want discussed (if any). It could be
>>> letting others know of what they're planning to work on, helping reaching a
>>> consensus on a JIRA, reminding people to review something that's important
>>> to them...
>>>  - once the agenda is built from this first round we go over each item
>>> in order.
>>>  - at the end notes are sent to the list. They usually have a list of
>>> action items (follow up on jira, review PR #x, ...) and
>>> resolved/unresolved discussion points.
>>>
>>> Generally, discussions happen on the mailing list, JIRA or github PRs
>>> and the sync helps getting those to conclusion faster.
>>>
>>> On Thu, Feb 2, 2017 at 8:36 AM, Julien Le Dem <jul...@dremio.com> wrote:
>>>
>>>> Reminder that the next Arrow sync is today at 10am PT (in 1 hour 25
>>>> min) on google hangout:
>>>> https://plus.google.com/hangouts/_/dremio.com/arrow
>>>>
>>>> On Thu, Jan 26, 2017 at 4:00 PM, Julien Le Dem <jul...@dremio.com>
>>>> wrote:
>>>>
>>>>> The next Arrow sync will be Thursday February 2nd 10am PT on google
>>>>> hangout
>>>>> https://plus.google.com/hangouts/_/dremio.com/arrow
>>>>> notes will be posted to the list
>>>>>
>>>>> --
>>>>> Julien
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Julien
>>>>
>>>
>>>
>>>
>>> --
>>> Julien
>>>
>>
>>
>>
>> --
>> Julien
>>
>
>
>
> --
> Julien
>



-- 
Julien


[jira] [Created] (ARROW-524) [java] provide apis to access nested vectors and buffers

2017-02-02 Thread Julien Le Dem (JIRA)
Julien Le Dem created ARROW-524:
---

 Summary: [java] provide apis to access nested vectors and buffers
 Key: ARROW-524
 URL: https://issues.apache.org/jira/browse/ARROW-524
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java - Vectors
Reporter: Julien Le Dem
Assignee: Julien Le Dem


To facilitate vectorized writes to vector we'd wand to provide access to 
underlying nested vectors and buffers.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Next Arrow sync

2017-02-02 Thread Julien Le Dem
Notes:

Attendance:
- Ajay: (USA ET). here to listen and learn. Has been using storage formats
at work.
- Kirils: (Europe) memory alignment in Arrow. corresponding PR for Netty.
- Uwe: (Europe) ready to make a 0.2 release in the next 2 weeks
- Wes: (USA ET) 2sigma in NY. Working on C++/Python components. ready for
0.2 as well. Worked with Nong on the streaming formats with integration
tests. with Uwe on Arrow-Parquet integration. Multi-threaded parquet reads
etc. thread safe work. Spark-13534: convert from Spark datasets to arrow
(file based) => spark summit Boston. Great speedups. Need to ship a release
to get it merged.
- Julien: (USA PT) Dremio in CA. discussed streaming with Nong, release 0.2

- Memory alignment (ARROW-186, PR#98):
   - Sometimes allocates too much memory.
   - Netty PR: https://github.com/netty/netty/pull/6293
   - need to find out when the next netty release comes out.
   - optional for 0.2 arrow release
- 0.2 release (ARROW-353):
   - see blocker on that jira
   - Spark-13534 depends on an Arrow release
   - some code cleanup JIRAs
   - integration test for binary data
   - other units for timestamps in java.
   - (optionally) c++: api for slicing arrays with 0 copy: adding an offset
member in the array
   - jemalloc for memory
   - Julien to create a lira for some java api improvements.
   - goal: close or move over JIRAS by end of next week. Friday 2/10 and
make the release
   - Uwe: release manager for 0.2 (will be the first release in pip python
package manager).
- 0.3
   - integration tests for timestamps





On Thu, Feb 2, 2017 at 10:00 AM, Julien Le Dem <jul...@dremio.com> wrote:

> The arrow sync is starting now:
> https://plus.google.com/hangouts/_/dremio.com/arrow
>
> On Thu, Feb 2, 2017 at 8:38 AM, Julien Le Dem <jul...@dremio.com> wrote:
>
>> (I just sent this to the Parquet list but this applies to Arrow as well)
>> Everybody interested is welcome.
>> If there is more than one of you in the same location I'd recommend
>> sharing the connection.
>> The sync is every other week, lasts one hour and goes as follows:
>>  - go around the "table" for everyone to quickly introduce themselves and
>> state the agenda items they'd want discussed (if any). It could be letting
>> others know of what they're planning to work on, helping reaching a
>> consensus on a JIRA, reminding people to review something that's important
>> to them...
>>  - once the agenda is built from this first round we go over each item in
>> order.
>>  - at the end notes are sent to the list. They usually have a list of
>> action items (follow up on jira, review PR #x, ...) and
>> resolved/unresolved discussion points.
>>
>> Generally, discussions happen on the mailing list, JIRA or github PRs and
>> the sync helps getting those to conclusion faster.
>>
>> On Thu, Feb 2, 2017 at 8:36 AM, Julien Le Dem <jul...@dremio.com> wrote:
>>
>>> Reminder that the next Arrow sync is today at 10am PT (in 1 hour 25 min)
>>> on google hangout:
>>> https://plus.google.com/hangouts/_/dremio.com/arrow
>>>
>>> On Thu, Jan 26, 2017 at 4:00 PM, Julien Le Dem <jul...@dremio.com>
>>> wrote:
>>>
>>>> The next Arrow sync will be Thursday February 2nd 10am PT on google
>>>> hangout
>>>> https://plus.google.com/hangouts/_/dremio.com/arrow
>>>> notes will be posted to the list
>>>>
>>>> --
>>>> Julien
>>>>
>>>
>>>
>>>
>>> --
>>> Julien
>>>
>>
>>
>>
>> --
>> Julien
>>
>
>
>
> --
> Julien
>



-- 
Julien


Re: Next Arrow sync

2017-02-02 Thread Julien Le Dem
The arrow sync is starting now:
https://plus.google.com/hangouts/_/dremio.com/arrow

On Thu, Feb 2, 2017 at 8:38 AM, Julien Le Dem <jul...@dremio.com> wrote:

> (I just sent this to the Parquet list but this applies to Arrow as well)
> Everybody interested is welcome.
> If there is more than one of you in the same location I'd recommend
> sharing the connection.
> The sync is every other week, lasts one hour and goes as follows:
>  - go around the "table" for everyone to quickly introduce themselves and
> state the agenda items they'd want discussed (if any). It could be letting
> others know of what they're planning to work on, helping reaching a
> consensus on a JIRA, reminding people to review something that's important
> to them...
>  - once the agenda is built from this first round we go over each item in
> order.
>  - at the end notes are sent to the list. They usually have a list of
> action items (follow up on jira, review PR #x, ...) and
> resolved/unresolved discussion points.
>
> Generally, discussions happen on the mailing list, JIRA or github PRs and
> the sync helps getting those to conclusion faster.
>
> On Thu, Feb 2, 2017 at 8:36 AM, Julien Le Dem <jul...@dremio.com> wrote:
>
>> Reminder that the next Arrow sync is today at 10am PT (in 1 hour 25 min)
>> on google hangout:
>> https://plus.google.com/hangouts/_/dremio.com/arrow
>>
>> On Thu, Jan 26, 2017 at 4:00 PM, Julien Le Dem <jul...@dremio.com> wrote:
>>
>>> The next Arrow sync will be Thursday February 2nd 10am PT on google
>>> hangout
>>> https://plus.google.com/hangouts/_/dremio.com/arrow
>>> notes will be posted to the list
>>>
>>> --
>>> Julien
>>>
>>
>>
>>
>> --
>> Julien
>>
>
>
>
> --
> Julien
>



-- 
Julien


[jira] [Commented] (ARROW-273) Lists use unsigned offset vectors instead of signed (as defined in the spec)

2017-01-26 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840772#comment-15840772
 ] 

Julien Le Dem commented on ARROW-273:
-

[~wesmckinn] yes

> Lists use unsigned offset vectors instead of signed (as defined in the spec)
> 
>
> Key: ARROW-273
> URL: https://issues.apache.org/jira/browse/ARROW-273
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java - Vectors
>        Reporter: Julien Le Dem
>
> The List vector defines it's offset vector as UInt4Vector. (unsigned int 34)
> According to the arrow spec it should be a signed int32.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Next Arrow sync

2017-01-26 Thread Julien Le Dem
The next Arrow sync will be Thursday February 2nd 10am PT on google hangout
https://plus.google.com/hangouts/_/dremio.com/arrow
notes will be posted to the list

-- 
Julien


Re: Arrow 0.2 release?

2017-01-25 Thread Julien Le Dem
agreed to both.

On Tue, Jan 24, 2017 at 6:11 AM, Wes McKinney  wrote:

> hi folks,
>
> With all the work that's happened over the last 3 months, it would be
> good to make a 0.2 release, and follow up with more frequent releases
> over the coming months. Is there any work anyone would like to see go
> in before 0.2? Otherwise, I believe we can go ahead and make an RC.
>
> As soon as we add integration tests for the new streaming format, we
> should probably make a 0.3 release. We are not integration testing all
> the data types yet, so we should also try to add tests for those, but
> it's not strictly necessary.
>
> Thanks
> Wes
>



-- 
Julien


  1   2   3   >