Re: Arrow read write support on Java

2018-12-15 Thread Masayuki Takahashi
I have created the JIRA.

https://issues.apache.org/jira/browse/PARQUET-1479

2018年12月14日(金) 0:44 Masayuki Takahashi :
>
> Hi Ryan,
>
> Which part do you want to discuss? May I create JIRA for?
>
> thanks.
> 2018年12月13日(木) 3:28 Ryan Blue :
> >
> > We've had a lot of discussion about this in the Iceberg community as well,
> > since Parquet to Arrow is going to be the easiest path to vectorized reads
> > for Spark. It would be great to have people working on it!
> >
> > On Wed, Dec 12, 2018 at 7:38 AM Wes McKinney  wrote:
> >
> > > hi Masayuki -- this is great to hear. Since this software was not
> > > developed in the Apache Parquet community we may need to careful about
> > > IP lineage / transfer issues if you do open a pull request.
> > >
> > > - Wes
> > > On Wed, Dec 12, 2018 at 9:23 AM Masayuki Takahashi
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > I am developing the simple converter from Parquet to Arrow.
> > > >
> > > > https://github.com/masayuki038/parquet-to-arrow
> > > >
> > > > If anyone have not started yet, may I create the JIRA and pull request
> > > > about the converter from parquet to arrow?
> > > >
> > > > I would like to develop the converter from Arrow to Parquet and some
> > > > features(like Dremio implementation).
> > > >
> > > > thanks.
> > > >
> > > >
> > > > 2018年12月12日(水) 23:49 Wes McKinney :
> > > > >
> > > > > hi Yurui,
> > > > >
> > > > > It has been discussed in the last 3 years, but I haven't seen anyone
> > > > > step up to begin to work on this yet. Having vectorized Arrow read and
> > > > > write in a reusable Java library would be very useful (it has proven
> > > > > popular in C++). We welcome your contributions.
> > > > >
> > > > > - Wes
> > > > > On Tue, Dec 11, 2018 at 9:34 PM Yurui Zhou 
> > > wrote:
> > > > > >
> > > > > > Hello
> > > > > >
> > > > > > I just learned arrow now provided a native reader/writer
> > > implementation on C++ to allow user directly read parquet file into Arrow
> > > Buffer and Write to parquet file from arrow buffer.
> > > > > >
> > > > > > I am wondering is there any plan on making the same support on the
> > > Java side?
> > > > > >
> > > > > > I found an implementation on dremio codebase that provide the arrow
> > > support mentioned above.
> > > https://github.com/dremio/dremio-oss/tree/master/sabot/kernel/src/main/java/com/dremio/exec/store/parquet
> > > > > >
> > > > > > Does the parquet community or arrow community have any plan to
> > > integrate this into the parquet codebase or implement a new version from
> > > scratch?
> > > > > >
> > > > > > Thanks
> > > > > > Yurui
> > > >
> > > >
> > > >
> > > > --
> > > > 高橋 真之
> > >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
>
>
>
> --
> 高橋 真之



-- 
高橋 真之


Re: Arrow read write support on Java

2018-12-14 Thread Masayuki Takahashi
Hi Wes,

Thanks for telling me details!
I am going to check the documents of projects that have already been donated.

thanks.
2018年12月14日(金) 0:48 Wes McKinney :
>
> hi,
>
> This software was developed outside of Apache Parquet:
> https://github.com/masayuki038/parquet-to-arrow. It would be different
> if this had been developed as pull requests into apache/parquet-mr,
> for example.
>
> We have a procedure for accepting foreign IP into Apache projects:
> http://incubator.apache.org/ip-clearance/
>
> - Wes
> On Thu, Dec 13, 2018 at 9:39 AM Masayuki Takahashi
>  wrote:
> >
> > Hi Wes,
> >
> > I could not understand about "IP lineage / transfer issues".
> > Could you tell me the details?
> >
> > I will try to conform to the rules of Parquet Community as much as possible.
> >
> > thank.
> > 2018年12月13日(木) 0:38 Wes McKinney :
> > >
> > > hi Masayuki -- this is great to hear. Since this software was not
> > > developed in the Apache Parquet community we may need to careful about
> > > IP lineage / transfer issues if you do open a pull request.
> > >
> > > - Wes
> > > On Wed, Dec 12, 2018 at 9:23 AM Masayuki Takahashi
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > I am developing the simple converter from Parquet to Arrow.
> > > >
> > > > https://github.com/masayuki038/parquet-to-arrow
> > > >
> > > > If anyone have not started yet, may I create the JIRA and pull request
> > > > about the converter from parquet to arrow?
> > > >
> > > > I would like to develop the converter from Arrow to Parquet and some
> > > > features(like Dremio implementation).
> > > >
> > > > thanks.
> > > >
> > > >
> > > > 2018年12月12日(水) 23:49 Wes McKinney :
> > > > >
> > > > > hi Yurui,
> > > > >
> > > > > It has been discussed in the last 3 years, but I haven't seen anyone
> > > > > step up to begin to work on this yet. Having vectorized Arrow read and
> > > > > write in a reusable Java library would be very useful (it has proven
> > > > > popular in C++). We welcome your contributions.
> > > > >
> > > > > - Wes
> > > > > On Tue, Dec 11, 2018 at 9:34 PM Yurui Zhou 
> > > > >  wrote:
> > > > > >
> > > > > > Hello
> > > > > >
> > > > > > I just learned arrow now provided a native reader/writer 
> > > > > > implementation on C++ to allow user directly read parquet file into 
> > > > > > Arrow Buffer and Write to parquet file from arrow buffer.
> > > > > >
> > > > > > I am wondering is there any plan on making the same support on the 
> > > > > > Java side?
> > > > > >
> > > > > > I found an implementation on dremio codebase that provide the arrow 
> > > > > > support mentioned above. 
> > > > > > https://github.com/dremio/dremio-oss/tree/master/sabot/kernel/src/main/java/com/dremio/exec/store/parquet
> > > > > >
> > > > > > Does the parquet community or arrow community have any plan to 
> > > > > > integrate this into the parquet codebase or implement a new version 
> > > > > > from scratch?
> > > > > >
> > > > > > Thanks
> > > > > > Yurui
> > > >
> > > >
> > > >
> > > > --
> > > > 高橋 真之
> >
> >
> >
> > --
> > 高橋 真之



-- 
高橋 真之


Re: Arrow read write support on Java

2018-12-13 Thread Wes McKinney
hi,

This software was developed outside of Apache Parquet:
https://github.com/masayuki038/parquet-to-arrow. It would be different
if this had been developed as pull requests into apache/parquet-mr,
for example.

We have a procedure for accepting foreign IP into Apache projects:
http://incubator.apache.org/ip-clearance/

- Wes
On Thu, Dec 13, 2018 at 9:39 AM Masayuki Takahashi
 wrote:
>
> Hi Wes,
>
> I could not understand about "IP lineage / transfer issues".
> Could you tell me the details?
>
> I will try to conform to the rules of Parquet Community as much as possible.
>
> thank.
> 2018年12月13日(木) 0:38 Wes McKinney :
> >
> > hi Masayuki -- this is great to hear. Since this software was not
> > developed in the Apache Parquet community we may need to careful about
> > IP lineage / transfer issues if you do open a pull request.
> >
> > - Wes
> > On Wed, Dec 12, 2018 at 9:23 AM Masayuki Takahashi
> >  wrote:
> > >
> > > Hi,
> > >
> > > I am developing the simple converter from Parquet to Arrow.
> > >
> > > https://github.com/masayuki038/parquet-to-arrow
> > >
> > > If anyone have not started yet, may I create the JIRA and pull request
> > > about the converter from parquet to arrow?
> > >
> > > I would like to develop the converter from Arrow to Parquet and some
> > > features(like Dremio implementation).
> > >
> > > thanks.
> > >
> > >
> > > 2018年12月12日(水) 23:49 Wes McKinney :
> > > >
> > > > hi Yurui,
> > > >
> > > > It has been discussed in the last 3 years, but I haven't seen anyone
> > > > step up to begin to work on this yet. Having vectorized Arrow read and
> > > > write in a reusable Java library would be very useful (it has proven
> > > > popular in C++). We welcome your contributions.
> > > >
> > > > - Wes
> > > > On Tue, Dec 11, 2018 at 9:34 PM Yurui Zhou  
> > > > wrote:
> > > > >
> > > > > Hello
> > > > >
> > > > > I just learned arrow now provided a native reader/writer 
> > > > > implementation on C++ to allow user directly read parquet file into 
> > > > > Arrow Buffer and Write to parquet file from arrow buffer.
> > > > >
> > > > > I am wondering is there any plan on making the same support on the 
> > > > > Java side?
> > > > >
> > > > > I found an implementation on dremio codebase that provide the arrow 
> > > > > support mentioned above. 
> > > > > https://github.com/dremio/dremio-oss/tree/master/sabot/kernel/src/main/java/com/dremio/exec/store/parquet
> > > > >
> > > > > Does the parquet community or arrow community have any plan to 
> > > > > integrate this into the parquet codebase or implement a new version 
> > > > > from scratch?
> > > > >
> > > > > Thanks
> > > > > Yurui
> > >
> > >
> > >
> > > --
> > > 高橋 真之
>
>
>
> --
> 高橋 真之


Re: Arrow read write support on Java

2018-12-13 Thread Masayuki Takahashi
Hi Ryan,

Which part do you want to discuss? May I create JIRA for?

thanks.
2018年12月13日(木) 3:28 Ryan Blue :
>
> We've had a lot of discussion about this in the Iceberg community as well,
> since Parquet to Arrow is going to be the easiest path to vectorized reads
> for Spark. It would be great to have people working on it!
>
> On Wed, Dec 12, 2018 at 7:38 AM Wes McKinney  wrote:
>
> > hi Masayuki -- this is great to hear. Since this software was not
> > developed in the Apache Parquet community we may need to careful about
> > IP lineage / transfer issues if you do open a pull request.
> >
> > - Wes
> > On Wed, Dec 12, 2018 at 9:23 AM Masayuki Takahashi
> >  wrote:
> > >
> > > Hi,
> > >
> > > I am developing the simple converter from Parquet to Arrow.
> > >
> > > https://github.com/masayuki038/parquet-to-arrow
> > >
> > > If anyone have not started yet, may I create the JIRA and pull request
> > > about the converter from parquet to arrow?
> > >
> > > I would like to develop the converter from Arrow to Parquet and some
> > > features(like Dremio implementation).
> > >
> > > thanks.
> > >
> > >
> > > 2018年12月12日(水) 23:49 Wes McKinney :
> > > >
> > > > hi Yurui,
> > > >
> > > > It has been discussed in the last 3 years, but I haven't seen anyone
> > > > step up to begin to work on this yet. Having vectorized Arrow read and
> > > > write in a reusable Java library would be very useful (it has proven
> > > > popular in C++). We welcome your contributions.
> > > >
> > > > - Wes
> > > > On Tue, Dec 11, 2018 at 9:34 PM Yurui Zhou 
> > wrote:
> > > > >
> > > > > Hello
> > > > >
> > > > > I just learned arrow now provided a native reader/writer
> > implementation on C++ to allow user directly read parquet file into Arrow
> > Buffer and Write to parquet file from arrow buffer.
> > > > >
> > > > > I am wondering is there any plan on making the same support on the
> > Java side?
> > > > >
> > > > > I found an implementation on dremio codebase that provide the arrow
> > support mentioned above.
> > https://github.com/dremio/dremio-oss/tree/master/sabot/kernel/src/main/java/com/dremio/exec/store/parquet
> > > > >
> > > > > Does the parquet community or arrow community have any plan to
> > integrate this into the parquet codebase or implement a new version from
> > scratch?
> > > > >
> > > > > Thanks
> > > > > Yurui
> > >
> > >
> > >
> > > --
> > > 高橋 真之
> >
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix



-- 
高橋 真之


Re: Arrow read write support on Java

2018-12-13 Thread Masayuki Takahashi
Hi Wes,

I could not understand about "IP lineage / transfer issues".
Could you tell me the details?

I will try to conform to the rules of Parquet Community as much as possible.

thank.
2018年12月13日(木) 0:38 Wes McKinney :
>
> hi Masayuki -- this is great to hear. Since this software was not
> developed in the Apache Parquet community we may need to careful about
> IP lineage / transfer issues if you do open a pull request.
>
> - Wes
> On Wed, Dec 12, 2018 at 9:23 AM Masayuki Takahashi
>  wrote:
> >
> > Hi,
> >
> > I am developing the simple converter from Parquet to Arrow.
> >
> > https://github.com/masayuki038/parquet-to-arrow
> >
> > If anyone have not started yet, may I create the JIRA and pull request
> > about the converter from parquet to arrow?
> >
> > I would like to develop the converter from Arrow to Parquet and some
> > features(like Dremio implementation).
> >
> > thanks.
> >
> >
> > 2018年12月12日(水) 23:49 Wes McKinney :
> > >
> > > hi Yurui,
> > >
> > > It has been discussed in the last 3 years, but I haven't seen anyone
> > > step up to begin to work on this yet. Having vectorized Arrow read and
> > > write in a reusable Java library would be very useful (it has proven
> > > popular in C++). We welcome your contributions.
> > >
> > > - Wes
> > > On Tue, Dec 11, 2018 at 9:34 PM Yurui Zhou  
> > > wrote:
> > > >
> > > > Hello
> > > >
> > > > I just learned arrow now provided a native reader/writer implementation 
> > > > on C++ to allow user directly read parquet file into Arrow Buffer and 
> > > > Write to parquet file from arrow buffer.
> > > >
> > > > I am wondering is there any plan on making the same support on the Java 
> > > > side?
> > > >
> > > > I found an implementation on dremio codebase that provide the arrow 
> > > > support mentioned above. 
> > > > https://github.com/dremio/dremio-oss/tree/master/sabot/kernel/src/main/java/com/dremio/exec/store/parquet
> > > >
> > > > Does the parquet community or arrow community have any plan to 
> > > > integrate this into the parquet codebase or implement a new version 
> > > > from scratch?
> > > >
> > > > Thanks
> > > > Yurui
> >
> >
> >
> > --
> > 高橋 真之



-- 
高橋 真之


Re: Arrow read write support on Java

2018-12-12 Thread Wes McKinney
hi Masayuki -- this is great to hear. Since this software was not
developed in the Apache Parquet community we may need to careful about
IP lineage / transfer issues if you do open a pull request.

- Wes
On Wed, Dec 12, 2018 at 9:23 AM Masayuki Takahashi
 wrote:
>
> Hi,
>
> I am developing the simple converter from Parquet to Arrow.
>
> https://github.com/masayuki038/parquet-to-arrow
>
> If anyone have not started yet, may I create the JIRA and pull request
> about the converter from parquet to arrow?
>
> I would like to develop the converter from Arrow to Parquet and some
> features(like Dremio implementation).
>
> thanks.
>
>
> 2018年12月12日(水) 23:49 Wes McKinney :
> >
> > hi Yurui,
> >
> > It has been discussed in the last 3 years, but I haven't seen anyone
> > step up to begin to work on this yet. Having vectorized Arrow read and
> > write in a reusable Java library would be very useful (it has proven
> > popular in C++). We welcome your contributions.
> >
> > - Wes
> > On Tue, Dec 11, 2018 at 9:34 PM Yurui Zhou  
> > wrote:
> > >
> > > Hello
> > >
> > > I just learned arrow now provided a native reader/writer implementation 
> > > on C++ to allow user directly read parquet file into Arrow Buffer and 
> > > Write to parquet file from arrow buffer.
> > >
> > > I am wondering is there any plan on making the same support on the Java 
> > > side?
> > >
> > > I found an implementation on dremio codebase that provide the arrow 
> > > support mentioned above. 
> > > https://github.com/dremio/dremio-oss/tree/master/sabot/kernel/src/main/java/com/dremio/exec/store/parquet
> > >
> > > Does the parquet community or arrow community have any plan to integrate 
> > > this into the parquet codebase or implement a new version from scratch?
> > >
> > > Thanks
> > > Yurui
>
>
>
> --
> 高橋 真之


Re: Arrow read write support on Java

2018-12-12 Thread Masayuki Takahashi
Hi,

I am developing the simple converter from Parquet to Arrow.

https://github.com/masayuki038/parquet-to-arrow

If anyone have not started yet, may I create the JIRA and pull request
about the converter from parquet to arrow?

I would like to develop the converter from Arrow to Parquet and some
features(like Dremio implementation).

thanks.


2018年12月12日(水) 23:49 Wes McKinney :
>
> hi Yurui,
>
> It has been discussed in the last 3 years, but I haven't seen anyone
> step up to begin to work on this yet. Having vectorized Arrow read and
> write in a reusable Java library would be very useful (it has proven
> popular in C++). We welcome your contributions.
>
> - Wes
> On Tue, Dec 11, 2018 at 9:34 PM Yurui Zhou  wrote:
> >
> > Hello
> >
> > I just learned arrow now provided a native reader/writer implementation on 
> > C++ to allow user directly read parquet file into Arrow Buffer and Write to 
> > parquet file from arrow buffer.
> >
> > I am wondering is there any plan on making the same support on the Java 
> > side?
> >
> > I found an implementation on dremio codebase that provide the arrow support 
> > mentioned above. 
> > https://github.com/dremio/dremio-oss/tree/master/sabot/kernel/src/main/java/com/dremio/exec/store/parquet
> >
> > Does the parquet community or arrow community have any plan to integrate 
> > this into the parquet codebase or implement a new version from scratch?
> >
> > Thanks
> > Yurui



-- 
高橋 真之


Re: Arrow read write support on Java

2018-12-12 Thread Wes McKinney
hi Yurui,

It has been discussed in the last 3 years, but I haven't seen anyone
step up to begin to work on this yet. Having vectorized Arrow read and
write in a reusable Java library would be very useful (it has proven
popular in C++). We welcome your contributions.

- Wes
On Tue, Dec 11, 2018 at 9:34 PM Yurui Zhou  wrote:
>
> Hello
>
> I just learned arrow now provided a native reader/writer implementation on 
> C++ to allow user directly read parquet file into Arrow Buffer and Write to 
> parquet file from arrow buffer.
>
> I am wondering is there any plan on making the same support on the Java side?
>
> I found an implementation on dremio codebase that provide the arrow support 
> mentioned above. 
> https://github.com/dremio/dremio-oss/tree/master/sabot/kernel/src/main/java/com/dremio/exec/store/parquet
>
> Does the parquet community or arrow community have any plan to integrate this 
> into the parquet codebase or implement a new version from scratch?
>
> Thanks
> Yurui


Arrow read write support on Java

2018-12-11 Thread Yurui Zhou
Hello

I just learned arrow now provided a native reader/writer implementation on C++ 
to allow user directly read parquet file into Arrow Buffer and Write to parquet 
file from arrow buffer.

I am wondering is there any plan on making the same support on the Java side? 

I found an implementation on dremio codebase that provide the arrow support 
mentioned above. 
https://github.com/dremio/dremio-oss/tree/master/sabot/kernel/src/main/java/com/dremio/exec/store/parquet
 


Does the parquet community or arrow community have any plan to integrate this 
into the parquet codebase or implement a new version from scratch?

Thanks
Yurui

smime.p7s
Description: S/MIME cryptographic signature