[DataFusion] Question about async/await?

2021-09-11 Thread Renjie Liu
Hi, all:
I see that the executor trait is marked as async/await in method
definition. I have several questions:
1. What async/await runtime is used in benchmarking?
2. Tokio is the most popular async/await runtime, and they suggest to put
long running tasks in separate thread pool rather than using tokio runtime
directly, and you can find this here <https://docs.rs/tokio/1.11.0/tokio/>

> If your code is CPU-bound and you wish to limit the number of threads used
> to run it, you should run it on another thread pool such as rayon
> <https://docs.rs/rayon>.
>
So my second question is did you test against thread pool execution mode?

It would be highly appreciated if you can answer my question.
-- 
Renjie Liu
Software Engineer, MVAD


[jira] [Created] (ARROW-7348) [Rust] Add api to return references of buffer of null bitmap.

2019-12-08 Thread Renjie Liu (Jira)
Renjie Liu created ARROW-7348:
-

 Summary: [Rust] Add api to return references of buffer of null 
bitmap.
 Key: ARROW-7348
 URL: https://issues.apache.org/jira/browse/ARROW-7348
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Renjie Liu
Assignee: Renjie Liu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7312) [Rust] ArrowError should implement std::error:Error

2019-12-04 Thread Renjie Liu (Jira)
Renjie Liu created ARROW-7312:
-

 Summary: [Rust] ArrowError should implement std::error:Error
 Key: ARROW-7312
 URL: https://issues.apache.org/jira/browse/ARROW-7312
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Renjie Liu
Assignee: Renjie Liu


ArrowError should implement this trait so that other crates can handle error 
from this crate more friendly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7113) [Rust] Buffer should accept memory owned by others

2019-11-12 Thread Renjie Liu (Jira)
Renjie Liu created ARROW-7113:
-

 Summary: [Rust] Buffer should accept memory owned by others
 Key: ARROW-7113
 URL: https://issues.apache.org/jira/browse/ARROW-7113
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Renjie Liu
Assignee: Renjie Liu


Currently rust Buffer always assume that the memory passed to it is owned by 
itself, and frees the memory when Buffer is dropped. This is inconvenient when 
used in cross language environments such as jni. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6948) [Rust] [Parquet] Fix bool array support in arrow reader.

2019-10-20 Thread Renjie Liu (Jira)
Renjie Liu created ARROW-6948:
-

 Summary: [Rust] [Parquet] Fix bool array support in arrow reader.
 Key: ARROW-6948
 URL: https://issues.apache.org/jira/browse/ARROW-6948
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Renjie Liu
Assignee: Renjie Liu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Understanding Arrow's CI problems and needs

2019-10-13 Thread Renjie Liu
Do we have ticket to track this?

?? Outlook for Android


From: Andy Grove 
Sent: Saturday, October 12, 2019 11:46:18 PM
To: dev 
Subject: Re: [DISCUSS] Understanding Arrow's CI problems and needs

I've started a new section to discuss proposals and current initiatives. I
know some of us have been working on some things but without much
coordination so far. It would be good to track these efforts so everyone
can comment on them.

On Fri, Oct 11, 2019 at 11:11 AM Wes McKinney  wrote:

> It seems some time has passed here. Would some others like to read the
> document and comment? This is important stuff.
>
> On Wed, Oct 2, 2019 at 2:20 PM Krisztián Szűcs
>  wrote:
> >
> > The current document greatly summarizes the current situation, but in
> > order to properly compare and eventually select a solution we need a
> > a detailed list of explicit features with some sort of classification,
> like
> > should/must have. For example our future CI system must support
> > "PRs from forks". After filling this table for the alternatives we can
> > have a much clearer picture.
> >
> > On Wed, Oct 2, 2019 at 4:06 PM Wes McKinney  wrote:
> >
> > > I reviewed the document, thanks for putting it together! I think it
> > > captures most of the requirements and the challenges that we are
> > > currently facing. I think that anyone who is actively contributing to
> > > the project or merging pull requests should read this document since
> > > this affects all of us.
> > >
> > > On Tue, Oct 1, 2019 at 1:55 PM Wes McKinney 
> wrote:
> > > >
> > > > Thanks Neal for starting this discussion. I will review and comment.
> > > >
> > > > I will say that as a maintainer the current situation is very nearly
> > > > intolerable. As by far and away the most prolific merger-of-PRs [1],
> > > > I've been negatively affected by the long queueing times and delayed
> > > > feedback cycles. The project would not be able to accommodate 2x or
> 5x
> > > > the volume of PRs that we have now, and so it is urgent that we
> > > > develop a scalable cross-platform CI solution that is under this
> > > > community's control and does not require a high maintenance burden,
> so
> > > > if we need to increase the amount of resources dedicated to CI we can
> > > > unilaterally do so.
> > > >
> > > > [1]: https://gist.github.com/wesm/78bfda4cef3b23a5193cf4fb8a6540fb
> > > >
> > > > On Tue, Oct 1, 2019 at 1:38 PM Neal Richardson
> > > >  wrote:
> > > > >
> > > > > Hi all,
> > > > > Over the last few months, I've seen a lot of frustration and
> > > > > discussion around the shortcomings of our current CI. I'm also
> seeing
> > > > > debate over a few possible solutions; unfortunately, the debates
> tend
> > > > > not to resolve in a clear, decisive way, and we end up having the
> same
> > > > > debates repeatedly.
> > > > >
> > > > > In my experience, this pattern often happens when there's not a
> shared
> > > > > understanding of the problems we're trying to solve--it's hard to
> > > > > agree on a solution if we don't agree on the problem. To help us
> reach
> > > > > consensus on the problems, I've started a document:
> > > > >
> > >
> https://docs.google.com/document/d/1fToW48TO-B9T8VRi0_Z30fDJkjOrBisc-Fr8Epl50s4/edit#
> > > > >
> > > > > Please have a look and add/edit freely. I've tried to capture the
> > > > > arguments I've seen go by the mailing list, as well as some from my
> > > > > own experience, but if I've mischaracterized anything, please
> rectify.
> > > > >
> > > > > I know several people have been exploring some potential solutions,
> > > > > and I hope this document can help us begin to discuss their
> relative
> > > > > merits more objectively and practically.
> > > > >
> > > > > Neal
> > >
>


Re: [DISCUSS] Proposal about integration test of arrow parquet reader

2019-10-13 Thread Renjie Liu
Currently for parquet reader of rust version only, some static files
covering some types would be enough.
However, I agree with Wes that we should not rely on static binary files
for functional tests because it's hard to maintain with the evolving of
arrow. For example, currently parquet reader in c++ doesn't support nested
type, so parquet file without nested type maybe easier to test against. But
when it can support nested types, we need to change again.
I think we should setup a program which can generate binary file for test
on demand in every test, which can be used by arrow libraries written in
different languages. I'll write a proposal about this in google doc.

On Sun, Oct 13, 2019 at 4:25 AM Wes McKinney  wrote:

> I think the ideal scenario is to have a mix of "endogenous" unit
> testing and functional testing against real files to test for
> regressions or cross-compatibility. To criticize the work we've done
> in the C++ project, we have not done enough systematic integration
> testing IMHO, but we do test against some "bad files" that have
> accumulated.
>
> In any case, I think it's bad practice for a file format reader to
> rely exclusively on functional testing against static binary files.
>
> This good be a good opportunity to devise a language-agnostic Parquet
> integration testing strategy. Given that we're looking to add nested
> data support in C++ hopefully by the end of 2020, it would be good
> timing.
>
> On Sat, Oct 12, 2019 at 11:12 AM Andy Grove  wrote:
> >
> > I also think that there are valid use cases for checking in binary files,
> > but we have to be careful not to abuse this. For example, we might want
> to
> > check in a Parquet file created by a particular version of Apache Spark
> to
> > ensure that Arrow implementations can read it successfully (hypothetical
> > example).
> >
> > It would also be good to have a small set of Parquet files using every
> > possible data type that all implementations can use in their tests. I
> > suppose we might want one set per Arrow format version as well.
> >
> > The problem we have now, in my opinion, is that we're proposing adding
> > files on a pretty ad-hoc basis, driven by the needs of individual
> > contributors in one language implementation, and this is perhaps
> happening
> > because we don't already have a good set of standard test files.
> >
> > Renjie - perhaps you could comment on this. If we had these standard
> files
> > covering all data types, would that have worked for you in this instance?
> >
> > Thanks,
> >
> > Andy.
> >
> > On Sat, Oct 12, 2019 at 12:03 AM Micah Kornfield 
> > wrote:
> >
> > > Hi Wes,
> > > >
> > > > I additionally would prefer generating the test corpus at test time
> > > > rather than checking in binary files.
> > >
> > >
> > > Can you elaborate on this? I think both generated on the fly and
> example
> > > files are useful.
> > >
> > > The checked in files catch regressions even when readers/writers can
> read
> > > their own data but they have either incorrect or undefined behavior in
> > > regards to the specification (for example I would imagine checking in a
> > > file as part of the fix for ARROW-6844
> > > <https://issues.apache.org/jira/browse/ARROW-6844>).
> > >
> > > Thanks,
> > > Micah
> > >
> > > On Thu, Oct 10, 2019 at 5:30 PM Renjie Liu 
> > > wrote:
> > >
> > > > Thanks wes. Sure I'll fix it.
> > > >
> > > > Wes McKinney  于 2019年10月11日周五 上午6:10写道:
> > > >
> > > > > I just merged the PR
> https://github.com/apache/arrow-testing/pull/11
> > > > >
> > > > > Various aspects of this make me uncomfortable so I hope they can be
> > > > > addressed in follow up work
> > > > >
> > > > > On Thu, Oct 10, 2019 at 5:41 AM Renjie Liu <
> liurenjie2...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > I've create ticket to track here:
> > > > > > https://issues.apache.org/jira/browse/ARROW-6845
> > > > > >
> > > > > > For this moment, can we check in those pregenerated data to
> unblock
> > > > rust
> > > > > > version's arrow reader?
> > > > > >
> > > > > > On Thu, Oct 10, 2019 at 1:20 PM Renjie Liu <
> liurenjie2...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > 

Re: [DISCUSS] Proposal about integration test of arrow parquet reader

2019-10-10 Thread Renjie Liu
Thanks wes. Sure I'll fix it.

Wes McKinney  于 2019年10月11日周五 上午6:10写道:

> I just merged the PR https://github.com/apache/arrow-testing/pull/11
>
> Various aspects of this make me uncomfortable so I hope they can be
> addressed in follow up work
>
> On Thu, Oct 10, 2019 at 5:41 AM Renjie Liu 
> wrote:
> >
> > I've create ticket to track here:
> > https://issues.apache.org/jira/browse/ARROW-6845
> >
> > For this moment, can we check in those pregenerated data to unblock rust
> > version's arrow reader?
> >
> > On Thu, Oct 10, 2019 at 1:20 PM Renjie Liu 
> wrote:
> >
> > > It would be fine in that case.
> > >
> > > Wes McKinney  于 2019年10月10日周四 下午12:58写道:
> > >
> > >> On Wed, Oct 9, 2019 at 10:16 PM Renjie Liu 
> > >> wrote:
> > >> >
> > >> > 1. There already exists a low level parquet writer which can produce
> > >> > parquet file, so unit test should be fine. But writer from arrow to
> > >> parquet
> > >> > doesn't exist yet, and it may take some period of time to finish it.
> > >> > 2. In fact my data are randomly generated and it's definitely
> > >> reproducible.
> > >> > However, I don't think it would be good idea to randomly generate
> data
> > >> > everytime we run ci because it would be difficult to debug. For
> example
> > >> PR
> > >> > a introduced a bug, which is triggerred in other PR's build it
> would be
> > >> > confusing for contributors.
> > >>
> > >> Presumably any random data generation would use a fixed seed precisely
> > >> to be reproducible.
> > >>
> > >> > 3. I think it would be good idea to spend effort on integration test
> > >> with
> > >> > parquet because it's an important use case of arrow. Also similar
> > >> approach
> > >> > could be extended to other language and other file format(avro,
> orc).
> > >> >
> > >> >
> > >> > On Wed, Oct 9, 2019 at 11:08 PM Wes McKinney 
> > >> wrote:
> > >> >
> > >> > > There are a number of issues worth discussion.
> > >> > >
> > >> > > 1. What is the timeline/plan for Rust implementing a Parquet
> _writer_?
> > >> > > It's OK to be reliant on other libraries in the short term to
> produce
> > >> > > files to test against, but does not strike me as a sustainable
> > >> > > long-term plan. Fixing bugs can be a lot more difficult than it
> needs
> > >> > > to be if you can't write targeted "endogenous" unit tests
> > >> > >
> > >> > > 2. Reproducible data generation
> > >> > >
> > >> > > I think if you're going to test against a pre-generated corpus,
> you
> > >> > > should make sure that generating the corpus is reproducible for
> other
> > >> > > developers (i.e. with a Dockerfile), and can be extended by
> adding new
> > >> > > files or random data generation.
> > >> > >
> > >> > > I additionally would prefer generating the test corpus at test
> time
> > >> > > rather than checking in binary files. If this isn't viable right
> now
> > >> > > we can create an "arrow-rust-crutch" git repository for you to
> stash
> > >> > > binary files until some of these testing scalability issues are
> > >> > > addressed.
> > >> > >
> > >> > > If we're going to spend energy on Parquet integration testing with
> > >> > > Java, this would be a good opportunity to do the work in a way
> where
> > >> > > the C++ Parquet library can also participate (since we ought to be
> > >> > > doing integration tests with Java, and we can also read JSON
> files to
> > >> > > Arrow).
> > >> > >
> > >> > > On Tue, Oct 8, 2019 at 11:54 PM Renjie Liu <
> liurenjie2...@gmail.com>
> > >> > > wrote:
> > >> > > >
> > >> > > > On Wed, Oct 9, 2019 at 12:11 PM Andy Grove <
> andygrov...@gmail.com>
> > >> > > wrote:
> > >> > > >
> > >> > > > > I'm very interested in helping to find a solution to this
> because
> > >> w

Re: [DISCUSS] Proposal about integration test of arrow parquet reader

2019-10-10 Thread Renjie Liu
I've create ticket to track here:
https://issues.apache.org/jira/browse/ARROW-6845

For this moment, can we check in those pregenerated data to unblock rust
version's arrow reader?

On Thu, Oct 10, 2019 at 1:20 PM Renjie Liu  wrote:

> It would be fine in that case.
>
> Wes McKinney  于 2019年10月10日周四 下午12:58写道:
>
>> On Wed, Oct 9, 2019 at 10:16 PM Renjie Liu 
>> wrote:
>> >
>> > 1. There already exists a low level parquet writer which can produce
>> > parquet file, so unit test should be fine. But writer from arrow to
>> parquet
>> > doesn't exist yet, and it may take some period of time to finish it.
>> > 2. In fact my data are randomly generated and it's definitely
>> reproducible.
>> > However, I don't think it would be good idea to randomly generate data
>> > everytime we run ci because it would be difficult to debug. For example
>> PR
>> > a introduced a bug, which is triggerred in other PR's build it would be
>> > confusing for contributors.
>>
>> Presumably any random data generation would use a fixed seed precisely
>> to be reproducible.
>>
>> > 3. I think it would be good idea to spend effort on integration test
>> with
>> > parquet because it's an important use case of arrow. Also similar
>> approach
>> > could be extended to other language and other file format(avro, orc).
>> >
>> >
>> > On Wed, Oct 9, 2019 at 11:08 PM Wes McKinney 
>> wrote:
>> >
>> > > There are a number of issues worth discussion.
>> > >
>> > > 1. What is the timeline/plan for Rust implementing a Parquet _writer_?
>> > > It's OK to be reliant on other libraries in the short term to produce
>> > > files to test against, but does not strike me as a sustainable
>> > > long-term plan. Fixing bugs can be a lot more difficult than it needs
>> > > to be if you can't write targeted "endogenous" unit tests
>> > >
>> > > 2. Reproducible data generation
>> > >
>> > > I think if you're going to test against a pre-generated corpus, you
>> > > should make sure that generating the corpus is reproducible for other
>> > > developers (i.e. with a Dockerfile), and can be extended by adding new
>> > > files or random data generation.
>> > >
>> > > I additionally would prefer generating the test corpus at test time
>> > > rather than checking in binary files. If this isn't viable right now
>> > > we can create an "arrow-rust-crutch" git repository for you to stash
>> > > binary files until some of these testing scalability issues are
>> > > addressed.
>> > >
>> > > If we're going to spend energy on Parquet integration testing with
>> > > Java, this would be a good opportunity to do the work in a way where
>> > > the C++ Parquet library can also participate (since we ought to be
>> > > doing integration tests with Java, and we can also read JSON files to
>> > > Arrow).
>> > >
>> > > On Tue, Oct 8, 2019 at 11:54 PM Renjie Liu 
>> > > wrote:
>> > > >
>> > > > On Wed, Oct 9, 2019 at 12:11 PM Andy Grove 
>> > > wrote:
>> > > >
>> > > > > I'm very interested in helping to find a solution to this because
>> we
>> > > really
>> > > > > do need integration tests for Rust to make sure we're compatible
>> with
>> > > other
>> > > > > implementations... there is also the ongoing CI dockerization work
>> > > that I
>> > > > > feel is related.
>> > > > >
>> > > > > I haven't looked at the current integration tests yet and would
>> > > appreciate
>> > > > > some pointers on how all of this works (do we have docs?) or
>> where to
>> > > start
>> > > > > looking.
>> > > > >
>> > > > I have a test in my latest PR:
>> https://github.com/apache/arrow/pull/5523
>> > > > And here is the generated data:
>> > > > https://github.com/apache/arrow-testing/pull/11
>> > > > As with program to generate these data, it's just a simple java
>> program.
>> > > > I'm not sure whether we need to integrate it into arrow.
>> > > >
>> > > > >
>> > > > > I imagine the integration test could follow the approach that
>> Renjie is
>> > > > > outlinin

[jira] [Created] (ARROW-6845) Setup process to generate random data for integration tests

2019-10-10 Thread Renjie Liu (Jira)
Renjie Liu created ARROW-6845:
-

 Summary: Setup process to generate random data for integration 
tests
 Key: ARROW-6845
 URL: https://issues.apache.org/jira/browse/ARROW-6845
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Renjie Liu
Assignee: Renjie Liu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Proposal about integration test of arrow parquet reader

2019-10-09 Thread Renjie Liu
It would be fine in that case.

Wes McKinney  于 2019年10月10日周四 下午12:58写道:

> On Wed, Oct 9, 2019 at 10:16 PM Renjie Liu 
> wrote:
> >
> > 1. There already exists a low level parquet writer which can produce
> > parquet file, so unit test should be fine. But writer from arrow to
> parquet
> > doesn't exist yet, and it may take some period of time to finish it.
> > 2. In fact my data are randomly generated and it's definitely
> reproducible.
> > However, I don't think it would be good idea to randomly generate data
> > everytime we run ci because it would be difficult to debug. For example
> PR
> > a introduced a bug, which is triggerred in other PR's build it would be
> > confusing for contributors.
>
> Presumably any random data generation would use a fixed seed precisely
> to be reproducible.
>
> > 3. I think it would be good idea to spend effort on integration test with
> > parquet because it's an important use case of arrow. Also similar
> approach
> > could be extended to other language and other file format(avro, orc).
> >
> >
> > On Wed, Oct 9, 2019 at 11:08 PM Wes McKinney 
> wrote:
> >
> > > There are a number of issues worth discussion.
> > >
> > > 1. What is the timeline/plan for Rust implementing a Parquet _writer_?
> > > It's OK to be reliant on other libraries in the short term to produce
> > > files to test against, but does not strike me as a sustainable
> > > long-term plan. Fixing bugs can be a lot more difficult than it needs
> > > to be if you can't write targeted "endogenous" unit tests
> > >
> > > 2. Reproducible data generation
> > >
> > > I think if you're going to test against a pre-generated corpus, you
> > > should make sure that generating the corpus is reproducible for other
> > > developers (i.e. with a Dockerfile), and can be extended by adding new
> > > files or random data generation.
> > >
> > > I additionally would prefer generating the test corpus at test time
> > > rather than checking in binary files. If this isn't viable right now
> > > we can create an "arrow-rust-crutch" git repository for you to stash
> > > binary files until some of these testing scalability issues are
> > > addressed.
> > >
> > > If we're going to spend energy on Parquet integration testing with
> > > Java, this would be a good opportunity to do the work in a way where
> > > the C++ Parquet library can also participate (since we ought to be
> > > doing integration tests with Java, and we can also read JSON files to
> > > Arrow).
> > >
> > > On Tue, Oct 8, 2019 at 11:54 PM Renjie Liu 
> > > wrote:
> > > >
> > > > On Wed, Oct 9, 2019 at 12:11 PM Andy Grove 
> > > wrote:
> > > >
> > > > > I'm very interested in helping to find a solution to this because
> we
> > > really
> > > > > do need integration tests for Rust to make sure we're compatible
> with
> > > other
> > > > > implementations... there is also the ongoing CI dockerization work
> > > that I
> > > > > feel is related.
> > > > >
> > > > > I haven't looked at the current integration tests yet and would
> > > appreciate
> > > > > some pointers on how all of this works (do we have docs?) or where
> to
> > > start
> > > > > looking.
> > > > >
> > > > I have a test in my latest PR:
> https://github.com/apache/arrow/pull/5523
> > > > And here is the generated data:
> > > > https://github.com/apache/arrow-testing/pull/11
> > > > As with program to generate these data, it's just a simple java
> program.
> > > > I'm not sure whether we need to integrate it into arrow.
> > > >
> > > > >
> > > > > I imagine the integration test could follow the approach that
> Renjie is
> > > > > outlining where we call Java to generate some files and then call
> Rust
> > > to
> > > > > parse them?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Andy.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Oct 8, 2019 at 9:48 PM Renjie Liu  >
> > > wrote:
> > > > >
> > > > > > Hi:
> > > > > >
> > > > >

Re: [DISCUSS] Proposal about integration test of arrow parquet reader

2019-10-09 Thread Renjie Liu
1. There already exists a low level parquet writer which can produce
parquet file, so unit test should be fine. But writer from arrow to parquet
doesn't exist yet, and it may take some period of time to finish it.
2. In fact my data are randomly generated and it's definitely reproducible.
However, I don't think it would be good idea to randomly generate data
everytime we run ci because it would be difficult to debug. For example PR
a introduced a bug, which is triggerred in other PR's build it would be
confusing for contributors.
3. I think it would be good idea to spend effort on integration test with
parquet because it's an important use case of arrow. Also similar approach
could be extended to other language and other file format(avro, orc).


On Wed, Oct 9, 2019 at 11:08 PM Wes McKinney  wrote:

> There are a number of issues worth discussion.
>
> 1. What is the timeline/plan for Rust implementing a Parquet _writer_?
> It's OK to be reliant on other libraries in the short term to produce
> files to test against, but does not strike me as a sustainable
> long-term plan. Fixing bugs can be a lot more difficult than it needs
> to be if you can't write targeted "endogenous" unit tests
>
> 2. Reproducible data generation
>
> I think if you're going to test against a pre-generated corpus, you
> should make sure that generating the corpus is reproducible for other
> developers (i.e. with a Dockerfile), and can be extended by adding new
> files or random data generation.
>
> I additionally would prefer generating the test corpus at test time
> rather than checking in binary files. If this isn't viable right now
> we can create an "arrow-rust-crutch" git repository for you to stash
> binary files until some of these testing scalability issues are
> addressed.
>
> If we're going to spend energy on Parquet integration testing with
> Java, this would be a good opportunity to do the work in a way where
> the C++ Parquet library can also participate (since we ought to be
> doing integration tests with Java, and we can also read JSON files to
> Arrow).
>
> On Tue, Oct 8, 2019 at 11:54 PM Renjie Liu 
> wrote:
> >
> > On Wed, Oct 9, 2019 at 12:11 PM Andy Grove 
> wrote:
> >
> > > I'm very interested in helping to find a solution to this because we
> really
> > > do need integration tests for Rust to make sure we're compatible with
> other
> > > implementations... there is also the ongoing CI dockerization work
> that I
> > > feel is related.
> > >
> > > I haven't looked at the current integration tests yet and would
> appreciate
> > > some pointers on how all of this works (do we have docs?) or where to
> start
> > > looking.
> > >
> > I have a test in my latest PR: https://github.com/apache/arrow/pull/5523
> > And here is the generated data:
> > https://github.com/apache/arrow-testing/pull/11
> > As with program to generate these data, it's just a simple java program.
> > I'm not sure whether we need to integrate it into arrow.
> >
> > >
> > > I imagine the integration test could follow the approach that Renjie is
> > > outlining where we call Java to generate some files and then call Rust
> to
> > > parse them?
> > >
> > > Thanks,
> > >
> > > Andy.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Oct 8, 2019 at 9:48 PM Renjie Liu 
> wrote:
> > >
> > > > Hi:
> > > >
> > > > I'm developing rust version of reader which reads parquet into arrow
> > > array.
> > > > To verify the correct of this reader, I use the following approach:
> > > >
> > > >
> > > >1. Define schema with protobuf.
> > > >2. Generate json data of this schema using other language with
> more
> > > >sophisticated implementation (e.g. java)
> > > >3. Generate parquet data of this schema using other language with
> more
> > > >sophisticated implementation (e.g. java)
> > > >4. Write tests to read json file, and parquet file into memory
> (arrow
> > > >array), then compare json data with arrow data.
> > > >
> > > >  I think with this method we can guarantee the correctness of arrow
> > > reader
> > > > because json format is ubiquitous and their implementation are more
> > > stable.
> > > >
> > > > Any comment is appreciated.
> > > >
> > >
> >
> >
> > --
> > Renjie Liu
> > Software Engineer, MVAD
>


-- 
Renjie Liu
Software Engineer, MVAD


Re: [DISCUSS] Proposal about integration test of arrow parquet reader

2019-10-08 Thread Renjie Liu
On Wed, Oct 9, 2019 at 12:11 PM Andy Grove  wrote:

> I'm very interested in helping to find a solution to this because we really
> do need integration tests for Rust to make sure we're compatible with other
> implementations... there is also the ongoing CI dockerization work that I
> feel is related.
>
> I haven't looked at the current integration tests yet and would appreciate
> some pointers on how all of this works (do we have docs?) or where to start
> looking.
>
I have a test in my latest PR: https://github.com/apache/arrow/pull/5523
And here is the generated data:
https://github.com/apache/arrow-testing/pull/11
As with program to generate these data, it's just a simple java program.
I'm not sure whether we need to integrate it into arrow.

>
> I imagine the integration test could follow the approach that Renjie is
> outlining where we call Java to generate some files and then call Rust to
> parse them?
>
> Thanks,
>
> Andy.
>
>
>
>
>
>
>
> On Tue, Oct 8, 2019 at 9:48 PM Renjie Liu  wrote:
>
> > Hi:
> >
> > I'm developing rust version of reader which reads parquet into arrow
> array.
> > To verify the correct of this reader, I use the following approach:
> >
> >
> >1. Define schema with protobuf.
> >2. Generate json data of this schema using other language with more
> >sophisticated implementation (e.g. java)
> >3. Generate parquet data of this schema using other language with more
> >sophisticated implementation (e.g. java)
> >4. Write tests to read json file, and parquet file into memory (arrow
> >array), then compare json data with arrow data.
> >
> >  I think with this method we can guarantee the correctness of arrow
> reader
> > because json format is ubiquitous and their implementation are more
> stable.
> >
> > Any comment is appreciated.
> >
>


-- 
Renjie Liu
Software Engineer, MVAD


[DISCUSS] Proposal about integration test of arrow parquet reader

2019-10-08 Thread Renjie Liu
Hi:

I'm developing rust version of reader which reads parquet into arrow array.
To verify the correct of this reader, I use the following approach:


   1. Define schema with protobuf.
   2. Generate json data of this schema using other language with more
   sophisticated implementation (e.g. java)
   3. Generate parquet data of this schema using other language with more
   sophisticated implementation (e.g. java)
   4. Write tests to read json file, and parquet file into memory (arrow
   array), then compare json data with arrow data.

 I think with this method we can guarantee the correctness of arrow reader
because json format is ubiquitous and their implementation are more stable.

Any comment is appreciated.


Re: Parquet to Arrow in Java

2019-08-09 Thread Renjie Liu
Hi:

I'm working on the rust part and expecting to finish this recently. I'm
also interested in the java version because we are trying to embed arrow in
spark to implement vectorized processing. Maybe we can work together.

Micah Kornfield  于 2019年8月5日周一 下午1:50写道:

> Hi Anoop,
> I think a contribution would be welcome.  There was a recent discussion
> thread on what would be expected from new "readers" for Arrow data in Java
> [1].  I think its worth reading through but my recollections of the
> highlights are:
> 1.  A short design sketch in the JIRA that will track the work.
> 2.  Off-heap data-structures as much as possible
> 3.  An interface that allows predicate push down, column projection and
> specifying the batch sizes of reads.  I think there is probably some
> interplay here between RowGroup size and size of batches.  It might worth
> thinking about this up front and mentioning in the design.
> 4.  Performant (since we care going from columnar->columar it should be
> faster then Parquet-MR and on-par or better then Spark's implementation
> which I believe also goes from columnar to columnar).
>
> Answers to specific questions below.
>
> Thanks,
> Micah
>
> To help me get started, are there any pointers on how the C++ or Rust
> > implementations currently read Parquet into Arrow?
>
> I'm not sure about the Rust code, but the C++ code is located at [2], it is
> has been going under some recent refactoring (and I think Wes might have 1
> or 2 changes till to make).  It doesn't yet support nested data types fully
> (e.g. structs).
>
> Are they reading Parquet row-by-row and building Arrow batches or are there
> > better ways of implementing this?
>
> I believe the implementations should be reading a row-group at a time
> column by column.  Spark potentially has an implementation that already
> does this.
>
>
> [1]
>
> https://lists.apache.org/thread.html/b096528600e66c17af9498c151352f12944ead2fd218a0257fdd4f70@%3Cdev.arrow.apache.org%3E
> [2] https://github.com/apache/arrow/tree/master/cpp/src/parquet/arrow
>
> On Sun, Aug 4, 2019 at 2:52 PM Anoop Johnson 
> wrote:
>
> > Thanks for the response Micah. I could implement this and contribute to
> > Arrow Java. To help me get started, are there any pointers on how the C++
> > or Rust implementations currently read Parquet into Arrow? Are they
> reading
> > Parquet row-by-row and building Arrow batches or are there better ways of
> > implementing this?
> >
> > On Tue, Jul 30, 2019 at 1:56 PM Micah Kornfield 
> > wrote:
> >
> >> Hi Anoop,
> >> There isn't currently anything in the Arrow Java library that does this.
> >> It is something that I think we want to add at some point.   Dremio [1]
> >> has
> >> some Parquet related code, but I haven't looked at it to understand how
> >> easy it is to use as a standalone library and whether is supports
> >> predicate
> >> push-down/column selection.
> >>
> >> Thanks,
> >> Micah
> >>
> >> [1]
> >>
> >>
> https://github.com/dremio/dremio-oss/tree/master/sabot/kernel/src/main/java/com/dremio/exec/store/parquet
> >>
> >> On Sun, Jul 28, 2019 at 2:08 PM Anoop Johnson <
> anoop.k.john...@gmail.com>
> >> wrote:
> >>
> >> > Arrow Newbie here.  What is the recommended way to convert Parquet
> data
> >> > into Arrow, preferably doing predicate/column pushdown?
> >> >
> >> > One can implement this as custom code using the Parquet API, and
> >> re-encode
> >> > it in Arrow using the Arrow APIs, but is this supported by Arrow out
> of
> >> the
> >> > box?
> >> >
> >> > Thanks,
> >> > Anoop
> >> >
> >>
> >
>


[jira] [Created] (ARROW-6069) [Rust] [Parquet] Implement Converter to convert record reader to arrow primitive array.

2019-07-30 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-6069:
-

 Summary: [Rust] [Parquet] Implement Converter to convert record 
reader to arrow primitive array.
 Key: ARROW-6069
 URL: https://issues.apache.org/jira/browse/ARROW-6069
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Renjie Liu
Assignee: Renjie Liu






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ARROW-5901) [Rust] Implement PartialEq to compare array and json values

2019-07-10 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5901:
-

 Summary: [Rust] Implement PartialEq to compare array and json 
values
 Key: ARROW-5901
 URL: https://issues.apache.org/jira/browse/ARROW-5901
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Renjie Liu
Assignee: Renjie Liu


Useful in tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5823) [Rust] Fix build break.

2019-07-02 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5823:
-

 Summary: [Rust] Fix build break.
 Key: ARROW-5823
 URL: https://issues.apache.org/jira/browse/ARROW-5823
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Renjie Liu
Assignee: Renjie Liu


Rust build breaks because some changes in array builder. However this error is 
not detected in ci scripts because missing --all-targets in cargo build command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5792) [Rust] [Parquet] A visitor trait for parquet types.

2019-06-29 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5792:
-

 Summary: [Rust] [Parquet] A visitor trait for parquet types.
 Key: ARROW-5792
 URL: https://issues.apache.org/jira/browse/ARROW-5792
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Renjie Liu
Assignee: Renjie Liu


Useful in dealing with parquet types.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5755) [Rust] [Parquet] Add derived clone for Type

2019-06-27 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5755:
-

 Summary: [Rust] [Parquet] Add derived clone for Type
 Key: ARROW-5755
 URL: https://issues.apache.org/jira/browse/ARROW-5755
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Renjie Liu
Assignee: Renjie Liu


Add clone for Type



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5463) [Rust] Implement AsRef for Buffer

2019-05-31 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5463:
-

 Summary: [Rust] Implement AsRef for Buffer
 Key: ARROW-5463
 URL: https://issues.apache.org/jira/browse/ARROW-5463
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Renjie Liu
Assignee: Renjie Liu


Implement AsRef ArrowNativeType for Buffer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5316) [Rust] Interfaces for gandiva bindings.

2019-05-14 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5316:
-

 Summary: [Rust] Interfaces for gandiva bindings.
 Key: ARROW-5316
 URL: https://issues.apache.org/jira/browse/ARROW-5316
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Renjie Liu
Assignee: Renjie Liu


Create interfaces to demonstrate high level design and ideas.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5315) [Rust] Gandiva binding.

2019-05-14 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5315:
-

 Summary: [Rust] Gandiva binding.
 Key: ARROW-5315
 URL: https://issues.apache.org/jira/browse/ARROW-5315
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Renjie Liu


Add gandiva binding for rust.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Rust bindings for Gandiva

2019-05-11 Thread Renjie Liu
I agree that this should be a separate project, so that this can be used by
other databases written in rust, not only datafusion. Let's start with an
implementation by binding with gandiva, and build pure rust implementation
later.

On Sat, May 11, 2019 at 10:28 PM Andy Grove  wrote:

> Hi Renjie,
>
> I have not started on this but I would be interested in helping you with
> it.
>
> At a high level I think there are two main parts to this work:
>
> 1. Translating DataFusion expressions to Gandiva protobuf
> 2. Implementing the code to make the native C call to Gandiva
>
> I could help with #1 pretty easily.
>
> I am concerned about the packaging implications of this. I feel quite
> strongly that there should be a "pure Rust" version of DataFusion/Arrow and
> that the Gandiva integration should be opt-in somehow, so maybe this is a
> separate project within the repository, or a feature that can be controlled
> by a feature flag in the Cargo.toml somehow.
>
> Thanks,
>
> Andy.
>
>
>
>
> On Sat, May 11, 2019 at 3:10 AM Renjie Liu 
> wrote:
>
>>
>> Hi:
>> @Andy Grove  Are you developing this? I'm
>> interested in this and want to join development.
>>
>> On Tue, Jan 8, 2019 at 3:18 PM Praveen Kumar  wrote:
>>
>>> Agree with Wes, the protobuf based interface should be the language
>>> neutral
>>> way to build expressions with Gandiva.
>>>
>>> On Mon, Jan 7, 2019 at 8:30 PM Andy Grove  wrote:
>>>
>>> > This makes sense to me know that I understand a little more about
>>> Gandiva.
>>> > This also fits well with my proposal to donate DataFusion in the other
>>> > thread. DataFusion can manage the overall logical query plan in Rust
>>> and
>>> > potentially delegate some subset of expression evaluation to Gandiva
>>> via
>>> > protobuf.
>>> >
>>> > Thanks,
>>> >
>>> > Andy.
>>> >
>>> > On Mon, Jan 7, 2019 at 7:51 AM Wes McKinney 
>>> wrote:
>>> >
>>> > > Gandiva supports a Protobuf-based interface -- this is how Java
>>> > > interacts with it via JNI. Rust could do the same -- that would
>>> > > probably be easier than wrapping the C++ class structure. It would
>>> > > also help drive new feature requirements in the serialized
>>> > > projection/filter expression trees
>>> > >
>>> > > - Wes
>>> > >
>>> > > On Mon, Jan 7, 2019 at 3:22 AM Krisztián Szűcs
>>> > >  wrote:
>>> > > >
>>> > > > I'm not sure, that a binding is a good idea. Both Arrow and Parquet
>>> > > > already have their own rust implementation, and a interfacing with
>>> > > > cpp isn't as easy and straightforward than it is with C. Otherwise
>>> > > > We could simply just maintain bindings for all of the cpp
>>> libraries,
>>> > > > rather than of having a hybrid solution.
>>> > > >
>>> > > > While We could spare the reimplementation of gandiva, it'd make
>>> > > > packaging more complicated and rust development way less
>>> > > > welcoming to new contributors.
>>> > > >
>>> > > > On Fri, Jan 4, 2019 at 3:39 PM Andy Grove 
>>> > wrote:
>>> > > >
>>> > > > > Now that the Rust implementation of Arrow is maturing, I'm
>>> interested
>>> > > in
>>> > > > > having bindings for Gandiva for query execution, rather than
>>> > > duplicating
>>> > > > > this in Rust.
>>> > > > >
>>> > > > > I will likely start looking at this soon but wanted to see if
>>> anyone
>>> > > else
>>> > > > > here is particularly interested in this area of functionality?
>>> > > > >
>>> > > > > Thanks,
>>> > > > >
>>> > > > > Andy.
>>> > > > >
>>> > >
>>> >
>>>
>>
>>
>> --
>> Renjie Liu
>> Software Engineer, MVAD
>>
>

-- 
Renjie Liu
Software Engineer, MVAD


Re: [ANNOUNCE] New Arrow committer: Neville Dipale

2019-05-11 Thread Renjie Liu
Congrats!

Chao Sun  于 2019年5月12日周日 上午12:38写道:

> Congrats Neville!
>
> On Sat, May 11, 2019 at 9:36 AM Micah Kornfield 
> wrote:
>
> > Congrats!!
> >
> > On Saturday, May 11, 2019, paddy horan  wrote:
> >
> > > Congrats Neville!  Thank you for your contributions!
> > >
> > > Get Outlook for iOS
> > > 
> > > From: Andy Grove 
> > > Sent: Saturday, May 11, 2019 11:23 AM
> > > To: dev@arrow.apache.org
> > > Subject: [ANNOUNCE] New Arrow committer: Neville Dipale
> > >
> > > On behalf of the Arrow PMC, I'm happy to announce that Neville has
> > >
> > > accepted an invitation to become a committer on Apache Arrow.
> > >
> > > Welcome, and thank you for your contributions!
> > >
> >
>


Re: Rust bindings for Gandiva

2019-05-11 Thread Renjie Liu
Hi:
@Andy Grove  Are you developing this? I'm interested
in this and want to join development.

On Tue, Jan 8, 2019 at 3:18 PM Praveen Kumar  wrote:

> Agree with Wes, the protobuf based interface should be the language neutral
> way to build expressions with Gandiva.
>
> On Mon, Jan 7, 2019 at 8:30 PM Andy Grove  wrote:
>
> > This makes sense to me know that I understand a little more about
> Gandiva.
> > This also fits well with my proposal to donate DataFusion in the other
> > thread. DataFusion can manage the overall logical query plan in Rust and
> > potentially delegate some subset of expression evaluation to Gandiva via
> > protobuf.
> >
> > Thanks,
> >
> > Andy.
> >
> > On Mon, Jan 7, 2019 at 7:51 AM Wes McKinney  wrote:
> >
> > > Gandiva supports a Protobuf-based interface -- this is how Java
> > > interacts with it via JNI. Rust could do the same -- that would
> > > probably be easier than wrapping the C++ class structure. It would
> > > also help drive new feature requirements in the serialized
> > > projection/filter expression trees
> > >
> > > - Wes
> > >
> > > On Mon, Jan 7, 2019 at 3:22 AM Krisztián Szűcs
> > >  wrote:
> > > >
> > > > I'm not sure, that a binding is a good idea. Both Arrow and Parquet
> > > > already have their own rust implementation, and a interfacing with
> > > > cpp isn't as easy and straightforward than it is with C. Otherwise
> > > > We could simply just maintain bindings for all of the cpp libraries,
> > > > rather than of having a hybrid solution.
> > > >
> > > > While We could spare the reimplementation of gandiva, it'd make
> > > > packaging more complicated and rust development way less
> > > > welcoming to new contributors.
> > > >
> > > > On Fri, Jan 4, 2019 at 3:39 PM Andy Grove 
> > wrote:
> > > >
> > > > > Now that the Rust implementation of Arrow is maturing, I'm
> interested
> > > in
> > > > > having bindings for Gandiva for query execution, rather than
> > > duplicating
> > > > > this in Rust.
> > > > >
> > > > > I will likely start looking at this soon but wanted to see if
> anyone
> > > else
> > > > > here is particularly interested in this area of functionality?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Andy.
> > > > >
> > >
> >
>


-- 
Renjie Liu
Software Engineer, MVAD


[jira] [Created] (ARROW-5298) [Rust] Add debug implementation for Buffer

2019-05-09 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5298:
-

 Summary: [Rust] Add debug implementation for Buffer
 Key: ARROW-5298
 URL: https://issues.apache.org/jira/browse/ARROW-5298
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Renjie Liu
Assignee: Renjie Liu


Default debug implementation is not good enough for debugging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5281) [Rust] [Parquet] Move DataPageBuilder to test_common

2019-05-07 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5281:
-

 Summary: [Rust] [Parquet] Move DataPageBuilder to test_common
 Key: ARROW-5281
 URL: https://issues.apache.org/jira/browse/ARROW-5281
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Renjie Liu
Assignee: Renjie Liu


DataPageBuilder is a helpful tool for mocking test page data, it's worthy to 
move it to test_common so that other parts can reuse it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5162) [Rust] [Parquet] Rename mod reader to arrow.

2019-04-11 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5162:
-

 Summary: [Rust] [Parquet] Rename mod reader to arrow.
 Key: ARROW-5162
 URL: https://issues.apache.org/jira/browse/ARROW-5162
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Renjie Liu
Assignee: Renjie Liu


Rename mod to arrow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5127) [Rust] [Parquet] Add page iterator

2019-04-05 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5127:
-

 Summary: [Rust] [Parquet] Add page iterator
 Key: ARROW-5127
 URL: https://issues.apache.org/jira/browse/ARROW-5127
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Renjie Liu
Assignee: Renjie Liu


Adds a page iterator for column reader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5126) [Rust] [Parquet] Convert parquet column desc to arrow data type

2019-04-05 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-5126:
-

 Summary: [Rust] [Parquet] Convert parquet column desc to arrow 
data type
 Key: ARROW-5126
 URL: https://issues.apache.org/jira/browse/ARROW-5126
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Renjie Liu
Assignee: Renjie Liu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New Arrow committer: Paddy Horan

2019-02-28 Thread Renjie Liu
Congrats!

Micah Kornfield  于 2019年3月1日周五 上午7:26写道:

> Congrats!
>
> On Thu, Feb 28, 2019 at 3:14 PM Bryan Cutler  wrote:
>
> > Congratulations Paddy!
> >
> > On Thu, Feb 28, 2019 at 7:14 AM Wes McKinney 
> wrote:
> >
> > > Welcome Paddy and thank you!
> > >
> > >
> > > On Thu, Feb 28, 2019 at 4:29 AM Uwe L. Korn  wrote:
> > > >
> > > > On behalf of the Arrow PMC, I'm happy to announce that Paddy has an
> > > > accepted an invitation to become a committer on Apache Arrow.
> > > >
> > > > Welcome, and thank you for your contributions!
> > >
> >
>


Re: [ANNOUNCE] New Arrow committer: Chao Sun

2019-02-28 Thread Renjie Liu
Congrats!

Micah Kornfield  于 2019年3月1日周五 上午7:26写道:

> Congrats!
>
> On Thu, Feb 28, 2019 at 3:02 PM Bryan Cutler  wrote:
>
> > Congratulations Chao!
> >
> > On Thu, Feb 28, 2019 at 9:27 AM Neville Dipale 
> > wrote:
> >
> > > Congratulations Chao and Paddy! I'm loving the increase in velocity on
> > the
> > > Rust side
> > >
> > > On Thu, 28 Feb 2019, 17:17 Wes McKinney,  wrote:
> > >
> > > > thank you Chao, and welcome!
> > > >
> > > > On Thu, Feb 28, 2019 at 6:18 AM paddy horan 
> > > > wrote:
> > > > >
> > > > > Congrats Chao!
> > > > >
> > > > > Get Outlook for iOS
> > > > > 
> > > > > From: Uwe L. Korn 
> > > > > Sent: Thursday, February 28, 2019 5:29 AM
> > > > > To: dev@arrow.apache.org
> > > > > Subject: [ANNOUNCE] New Arrow committer: Chao Sun
> > > > >
> > > > > On behalf of the Arrow PMC, I'm happy to announce that Chao has an
> > > > > accepted an invitation to become a committer on Apache Arrow.
> > > > >
> > > > > Welcome, and thank you for your contributions!
> > > >
> > >
> >
>


Re: [jira] [Created] (ARROW-4678) [Rust] Minimize unstable feature usage

2019-02-26 Thread Renjie Liu
+1 for this proposal.
By the way, maybe it's a better idea to split these changes into small
patcher rather than a big one so that we can review them one by one.

On Tue, Feb 26, 2019 at 8:58 AM Steven Fackler (JIRA) 
wrote:

> Steven Fackler created ARROW-4678:
> -
>
>  Summary: [Rust] Minimize unstable feature usage
>  Key: ARROW-4678
>  URL: https://issues.apache.org/jira/browse/ARROW-4678
>  Project: Apache Arrow
>   Issue Type: Improvement
>   Components: Rust
> Affects Versions: 0.12.0
> Reporter: Steven Fackler
>
>
> The Rust implementation currently uses quite a few nightly features. This
> is unfortunately a hard blocker on using these crates for many users.
>
> Here's the list of currently use nightly features:
>  * type_ascription: Unused, can be trivially removed.
>  * rustc_private: Unused, can be trivially removed.
>  * box_syntax: Indefinitely far from stabilization, trivially replaceable
> with Box::new.
>  * box_patterns: Indefinitely far from stabilization, replaceable with
> some minor restructuring of a couple of matches.
>  * serde's alloc feature: Unused, can be trivially removed.
>  * try_from: Scheduled for stabilization in Rust 1.35.
>  * specialization: Actively being worked on - maybe ~1 year timeframe?
>  * packed_simd: Actively being worked on - maybe ~1 year timeframe?
>
> The first set of features are easy enough to get rid of - I'll make a PR
> to do that (https://github.com/sfackler/arrow/tree/more-stable). I'm a
> bit less sure of what to do with specialization and packed_simd, though.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>


-- 
Renjie Liu
Software Engineer, MVAD


[jira] [Created] (ARROW-4634) [Rust] [Parquet] Reorganize test_common mod to allow more test util codes.

2019-02-19 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-4634:
-

 Summary: [Rust] [Parquet] Reorganize test_common mod to allow more 
test util codes.
 Key: ARROW-4634
 URL: https://issues.apache.org/jira/browse/ARROW-4634
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Renjie Liu
Assignee: Renjie Liu


Currently test_common mod is just one file, and when we need to add more test 
utils into it, things may messed up, so I propose to make test_common a 
directory with multi sub mods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [Rust] Rust 0.13.0 release

2019-02-14 Thread Renjie Liu
Then I'm expecting to finish it in 0.14

Wes McKinney  于 2019年2月13日周三 下午11:08写道:

> > BTW, what's the time line of 0.13.0?
>
> See
> https://lists.apache.org/thread.html/7890bd7aebd2d2018fa68a78630280581a544346ce80e4002cd9e548@%3Cdev.arrow.apache.org%3E
>
> Since 0.12 was ~January 20 I think it would be good to release again
> by the end of March
>
> On Wed, Feb 13, 2019 at 7:29 AM Renjie Liu 
> wrote:
> >
> > Hi, Andy:
> >  Thanks for bringing this thread. I'm working on the arrow reader for
> > parquet and expecting to make progress recently. BTW, what's the time
> line
> > of 0.13.0?
> >
> > Chao Sun  于 2019年2月13日周三 上午10:34写道:
> >
> > > I’m also interested in the Parquet/Arrow integration and may help
> there.
> > > This is however a relative large feature and I’m not sure if it can be
> done
> > > in 0.13.
> > >
> > > Another area I’d like to work in is high level Parquet writer support.
> This
> > > issue has been discussed several times in the past. People should not
> need
> > > to specify definition & repetition levels in order to write data in
> Parquet
> > > format.
> > >
> > > Chao
> > >
> > >
> > >
> > > On Wed, Feb 13, 2019 at 10:24 AM paddy horan 
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > The focus for me for 0.13.0 is SIMD.  I would like to port all the
> "ops"
> > > > in "array_ops" to the new "compute" module and leverage SIMD for them
> > > all.
> > > > I have most of this done in various forks.
> > > >
> > > > Past 0.13.0 I would really like to work toward getting Rust running
> in
> > > the
> > > > integration tests.  The thing I am most excited about regarding
> Arrow is
> > > > the concept of defining computational libraries in say Rust and being
> > > able
> > > > to use them from any implementation, pyarrow probably for me.  This
> all
> > > > starts and ends with the integration tests.
> > > >
> > > > Also, Gandiva is fascinating I would love to have robust support for
> this
> > > > in Rust (via bindings)...
> > > >
> > > > Regards,
> > > > P
> > > >
> > > >
> > > > 
> > > > From: Neville Dipale 
> > > > Sent: Tuesday, February 12, 2019 11:33 AM
> > > > To: dev@arrow.apache.org
> > > > Subject: Re: [Rust] Rust 0.13.0 release
> > > >
> > > > Thanks for bringing this up Andy.
> > > >
> > > > I'm unemployed/on recovery leave, so I've had some surplus time to
> work
> > > on
> > > > Rust.
> > > >
> > > > There's a lot of features that I've wanted to work on, some which
> I've
> > > > spent some time attempting, but struggled with. A few block
> additional
> > > work
> > > > that I could contribute.
> > > >
> > > > In 0.13.0 and the release thereafter: I'd like to see:
> > > >
> > > > Date/time support. I've spent a lot of time trying to implement this,
> > > but I
> > > > get the feeling that my Rust isn't good enough yet to pull this
> together.
> > > >
> > > > More IO support.
> > > > I'm working on JSON reader, and want to work on JSON and CSV
> (continuing
> > > > where you left off) writers after this.
> > > > With date/time support, I can also work on date/time parsing so we
> can
> > > have
> > > > these in CSV and JSON.
> > > > Parquet support isn't on my radar at the moment. JSON and CSV are
> more
> > > > commonly used, so I'm hoping that with concrete support for these,
> more
> > > > people using Rust can choose to integrate Arrow. That could bring us
> more
> > > > hands to help.
> > > >
> > > > Array slicing (https://issues.apache.org/jira/browse/ARROW-3954). I
> > > tried
> > > > working on it but failed. Related to this would be array chunking.
> > > > I need these in order to be able to operate on "Tables" like CPP,
> Python
> > > > and others. I've got ChunkedArray, Column and Table roughly
> implemented
> > > in
> > > > my fork, but without zero-copy slicing, I can't upstream them.
> > > >
> > > > I've made good progress on scalar and array operations. I h

Re: [Rust] Rust 0.13.0 release

2019-02-13 Thread Renjie Liu
Hi, Andy:
 Thanks for bringing this thread. I'm working on the arrow reader for
parquet and expecting to make progress recently. BTW, what's the time line
of 0.13.0?

Chao Sun  于 2019年2月13日周三 上午10:34写道:

> I’m also interested in the Parquet/Arrow integration and may help there.
> This is however a relative large feature and I’m not sure if it can be done
> in 0.13.
>
> Another area I’d like to work in is high level Parquet writer support. This
> issue has been discussed several times in the past. People should not need
> to specify definition & repetition levels in order to write data in Parquet
> format.
>
> Chao
>
>
>
> On Wed, Feb 13, 2019 at 10:24 AM paddy horan 
> wrote:
>
> > Hi All,
> >
> > The focus for me for 0.13.0 is SIMD.  I would like to port all the "ops"
> > in "array_ops" to the new "compute" module and leverage SIMD for them
> all.
> > I have most of this done in various forks.
> >
> > Past 0.13.0 I would really like to work toward getting Rust running in
> the
> > integration tests.  The thing I am most excited about regarding Arrow is
> > the concept of defining computational libraries in say Rust and being
> able
> > to use them from any implementation, pyarrow probably for me.  This all
> > starts and ends with the integration tests.
> >
> > Also, Gandiva is fascinating I would love to have robust support for this
> > in Rust (via bindings)...
> >
> > Regards,
> > P
> >
> >
> > 
> > From: Neville Dipale 
> > Sent: Tuesday, February 12, 2019 11:33 AM
> > To: dev@arrow.apache.org
> > Subject: Re: [Rust] Rust 0.13.0 release
> >
> > Thanks for bringing this up Andy.
> >
> > I'm unemployed/on recovery leave, so I've had some surplus time to work
> on
> > Rust.
> >
> > There's a lot of features that I've wanted to work on, some which I've
> > spent some time attempting, but struggled with. A few block additional
> work
> > that I could contribute.
> >
> > In 0.13.0 and the release thereafter: I'd like to see:
> >
> > Date/time support. I've spent a lot of time trying to implement this,
> but I
> > get the feeling that my Rust isn't good enough yet to pull this together.
> >
> > More IO support.
> > I'm working on JSON reader, and want to work on JSON and CSV (continuing
> > where you left off) writers after this.
> > With date/time support, I can also work on date/time parsing so we can
> have
> > these in CSV and JSON.
> > Parquet support isn't on my radar at the moment. JSON and CSV are more
> > commonly used, so I'm hoping that with concrete support for these, more
> > people using Rust can choose to integrate Arrow. That could bring us more
> > hands to help.
> >
> > Array slicing (https://issues.apache.org/jira/browse/ARROW-3954). I
> tried
> > working on it but failed. Related to this would be array chunking.
> > I need these in order to be able to operate on "Tables" like CPP, Python
> > and others. I've got ChunkedArray, Column and Table roughly implemented
> in
> > my fork, but without zero-copy slicing, I can't upstream them.
> >
> > I've made good progress on scalar and array operations. I have trig
> > functions, some string operators and other functions that one can run on
> a
> > Spark-esque dataframe.
> > These will fit in well with DataFusion's SQL operations, but from a
> > decision-perspective, I think it would help if we join heads and think
> > about the direction we want to take on compute.
> >
> > SIMD is great, and when Paddy's hashed out how it works, more of us will
> be
> > able to contribute SIMD compatible compute operators.
> >
> > Thanks,
> > Neville
> >
> > On Tue, 12 Feb 2019 at 18:12, Andy Grove  wrote:
> >
> > > I was curious what our Rust committers and contributors are excited
> about
> > > for 0.13.0.
> > >
> > > The feature I would most like to see is that ability for DataFusion to
> > run
> > > SQL against Parquet files again, as that would give me an excuse for a
> > PoC
> > > in my day job using Arrow.
> > >
> > > I know there were some efforts underway to build arrow array readers
> for
> > > Parquet and it would make sense for me to help there.
> > >
> > > I would also like to start building out some benchmarks.
> > >
> > > I think the SIMD work is exciting too.
> > >
> > > I'd like to hear thoughts from everyone else though since we're all
> > coming
> > > at this from different perspectives.
> > >
> > > Thanks,
> > >
> > > Andy.
> > >
> >
>


[jira] [Created] (ARROW-4525) [Rust] [Parquet] Convert ArrowError to ParquetError

2019-02-10 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-4525:
-

 Summary: [Rust] [Parquet] Convert ArrowError to ParquetError
 Key: ARROW-4525
 URL: https://issues.apache.org/jira/browse/ARROW-4525
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Renjie Liu
Assignee: Renjie Liu


We need to enable conversion from ArrowError to ParquetError. This is useful 
when integrating arrow with parquet, e.g. when reading parquet data into arrow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New Arrow PMC member: Andy Grove

2019-02-05 Thread Renjie Liu
Congratilations!

Bryan Cutler  于 2019年2月5日周二 下午12:08写道:

> Congratulations Andy!
>
> On Mon, Feb 4, 2019, 3:29 PM Philipp Moritz 
> > Congratulations!
> >
> > On Mon, Feb 4, 2019 at 3:16 PM Krisztián Szűcs <
> szucs.kriszt...@gmail.com>
> > wrote:
> >
> > > Congrats Andy! :)
> > >
> > > On Mon, Feb 4, 2019 at 4:39 PM Wes McKinney 
> wrote:
> > >
> > > > The Project Management Committee (PMC) for Apache Arrow has invited
> > > > Andy Grove to become a PMC member and we are pleased to announce that
> > > > Andy has accepted.
> > > >
> > > > Congratulations and welcome!
> > > >
> > >
> >
>


Re: [Rust] code style: restrict line width to 90 characters?

2019-01-25 Thread Renjie Liu
+1 for this suggestio.

Chao Sun  于 2019年1月26日周六 上午2:39写道:

> Hi Neville, there's no limit today: you'll need to add
>
> max_width = 90
> comment_width = 90
>
> to rustfmt.toml to limit both source code and comment to 90 characters.
>
> Chao
>
> On Fri, Jan 25, 2019 at 10:34 AM Neville Dipale 
> wrote:
>
> > Hi Chao,
> >
> > What's the current limit? I just ran rustfmt, and seems like it's not
> > reformatting at 100 characters. I support changing whatever the current
> > width is to 90 characters.
> >
> > Regards
> > Neville
> >
> > On Fri, 25 Jan 2019 at 19:49, Chao Sun  wrote:
> >
> > > Hi Rust developers,
> > >
> > > Just want to know if anyone like the idea to restrict the line width to
> > 90
> > > characters for Rust, similar to the C++ coding style. Personally I
> found
> > it
> > > helpful when you need to keep multiple windows in a monitor. This can
> > > easily be enforced via rustfmt. If there's no objection, I can open a
> > JIRA
> > > for this and apply the change to the existing codebase.
> > >
> > > Thanks,
> > > Chao
> > >
> >
>


[jira] [Created] (ARROW-4365) [Rust] [Parquet] Implement RecordReader

2019-01-24 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-4365:
-

 Summary: [Rust] [Parquet] Implement RecordReader
 Key: ARROW-4365
 URL: https://issues.apache.org/jira/browse/ARROW-4365
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Renjie Liu
Assignee: Renjie Liu


RecordReader reads logical records into memory, this is the prerequisite for 
ColumnReader



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Accept donation of Rust DataFusion library for Apache Arrow

2019-01-23 Thread Renjie Liu
+1 (non-binding)

I also tried to write a similar engine, and glad to merge with datadusion

paddy horan  于 2019年1月24日周四 上午5:29写道:

> +1 (non-binding)
>
> Thanks Andy
>
> Get Outlook for iOS
>
> 
> From: Chao Sun 
> Sent: Wednesday, January 23, 2019 1:07 PM
> To: dev@arrow.apache.org
> Subject: Re: [VOTE] Accept donation of Rust DataFusion library for Apache
> Arrow
>
> +1 (non-binding)
>
> Glad to see this coming and I think it is a great complement to existing
> modules, e.g., Arrow and Parquet. It also aligns with the overall direction
> that the project is going.
>
> Chao
>
> On Wed, Jan 23, 2019 at 9:30 AM Andy Grove  wrote:
>
> > As far as I know, the majority of the PMC are not actively using Rust, so
> > as supporting evidence for interest in this donation from the Rust
> > community, here is a Reddit thread where I talked about offering
> DataFusion
> > for donation recently:
> >
> >
> >
> https://www.reddit.com/r/rust/comments/aibk39/datafusion_060_inmemory_query_engine_for_apache/
> >
> > There were 69 upvotes and many supportive comments, including a couple
> > where people specifically mentioned that they liked the fact that
> > DataFusion uses Arrow. I would hope that this donation leads to more
> people
> > contributing to Arrow.
> >
> > Thanks,
> >
> > Andy.
> >
> > On Wed, Jan 23, 2019 at 4:26 AM Neville Dipale 
> > wrote:
> >
> > > Hi Andy,
> > >
> > > +1 : Accept contribution of DataFusion Rust library
> > >
> > > Thanks
> > >
> > > On Wed, 23 Jan 2019 at 03:05, Wes McKinney 
> wrote:
> > >
> > > > Dear all,
> > > >
> > > > The developers of DataFusion, an analytical query engine written
> > > > in Rust, based on the Arrow columnar memory format, are proposing
> > > > to donate the code to Apache Arrow:
> > > >
> > > > https://github.com/andygrove/datafusion
> > > >
> > > > The community has had an opportunity to discuss this [1] and
> > > > there do not seem to be objections to this. Andy Grove has staged
> > > > the code donation in the form of a pull request:
> > > >
> > > > https://github.com/apache/arrow/pull/3399
> > > >
> > > > This vote is to determine if the Arrow PMC is in favor of accepting
> > > > this donation. If the vote passes, the PMC and the authors of the
> code
> > > > will work together to complete the ASF IP Clearance process
> > > > (http://incubator.apache.org/ip-clearance/) and import this Rust
> > > > codebase implementation into Apache Arrow.
> > > >
> > > > [ ] +1 : Accept contribution of DataFusion Rust library
> > > > [ ] 0 : No opinion
> > > > [ ] -1 : Reject contribution because...
> > > >
> > > > Here is my vote: +1
> > > >
> > > > The vote will be open for at least 72 hours.
> > > >
> > > > Thanks,
> > > > Wes
> > > >
> > > > [1]:
> > > >
> > >
> >
> https://lists.apache.org/thread.html/2f6c14e9f5a9ab41b0b591b2242741b23e5528fb28e79ac0e2c9349a@%3Cdev.arrow.apache.org%3E
> > > >
> > >
> >
>


[jira] [Created] (ARROW-4219) [Rust] [Parquet] Implement ArrowReader

2019-01-09 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-4219:
-

 Summary: [Rust] [Parquet] Implement ArrowReader
 Key: ARROW-4219
 URL: https://issues.apache.org/jira/browse/ARROW-4219
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Renjie Liu
Assignee: Renjie Liu


ArrowReader reads parquet into arrow. In this ticket our goal is to  implement 
get_schema and read row groups into record batch iterator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4218) [Rust][Parquet]Implement ColumnReader

2019-01-09 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-4218:
-

 Summary: [Rust][Parquet]Implement ColumnReader
 Key: ARROW-4218
 URL: https://issues.apache.org/jira/browse/ARROW-4218
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Renjie Liu
Assignee: Renjie Liu


ColumnReader reads columns in parquet file into arrow array, this's this the 
first step for reading parquet into arrow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [Rust] move parquet into a separate sub-crate

2018-12-27 Thread Renjie Liu
Cool. It may also be worthy to put adapters into a separate crate.

On Fri, Dec 28, 2018 at 4:10 AM Chao Sun  wrote:

> Hi,
>
> It just occurs to me that it may be a better idea to move the parquet
> module into a separate sub-crate by using cargo workspaces
> <https://doc.rust-lang.org/book/ch14-03-cargo-workspaces.html>. The
> advantage is that we can make the project more modular (in future, we may
> want to add more sub-crates such as arrow/parquet_derive, orc, gandiva,
> etc), and allow us to run CI jobs separately on each crate.
>
> Some small caveats:
> 1. Cargo doesn't allow cyclic dependency. So if the parquet sub-crate
> depends on arrow, we can't reference parquet in arrow. This doesn't seem
> like an issue though since arrow itself should be physical on-disk format
> independent. I also didn't see any reference on parquet in cpp/src/arrow.
> 2. The path dependency used in workspace has to be changed to a version
> number when we do "cargo publish". This should be added to the release
> instructions and committer who performs the job should do the extra step.
>
> Thoughts?
>
> Chao
>


-- 
Renjie Liu
Software Engineer, MVAD


Re: [DISCUSS] Rust add adapter for parquet

2018-11-21 Thread Renjie Liu
That sounds great. But parquet-rs currently relies on nightly rust, that
would be the first problem to resolve.

On Wed, Nov 21, 2018 at 4:49 AM Andy Grove  wrote:

> This sounds like a great idea.
>
> With support for both CSV and Parquet in the Arrow crate, it would be nice
> to design a standard interface for Arrow data sources. Maybe this is as
> simple as implementing `Iterator`.
>
> Andy.
>
> On Tue, Nov 20, 2018 at 11:46 AM Chao Sun  wrote:
>
> > Yes, we'd be interested to move forward. I'm inclined to merge this into
> > Arrow because of the issues that you pointed out with parquet c++ merge,
> > and I do see a tight relationship between the two projects, and potential
> > sharing of common libraries. @Ivan Sadikov  what
> > do you think?
> >
> > Chao
> >
> > On Tue, Nov 20, 2018 at 10:23 AM Wes McKinney 
> wrote:
> >
> >> hi folks,
> >>
> >> Would you all be interested in moving forward the parquet-rs project?
> >> I have a little more bandwidth to help with the code donation in the
> >> next month or two.
> >>
> >> I know we voted on the Parquet mailing list about the donation
> >> already. One big question is whether you want to create an
> >> apache/parquet-rs repository or whether you want to co-develop
> >> parquet-rs together with Arrow in Rust, similar to what we are doing
> >> with C++. It's possible you might run into the same kinds of issues
> >> that led us to consider the monorepo arrangement.
> >>
> >> Thanks
> >> Wes
> >> On Sun, Aug 19, 2018 at 11:11 PM Renjie Liu 
> >> wrote:
> >> >
> >> > Hi, Chao:
> >> > I've opened an jira issue for that and planning to work on that.
> >> >
> >> > On Mon, Aug 20, 2018 at 11:03 AM Renjie Liu 
> >> wrote:
> >> >
> >> > > Yes, it's a mistake, sorry for that
> >> > >
> >> > >
> >> > > On Mon, Aug 20, 2018 at 10:57 AM Chao Sun 
> wrote:
> >> > >
> >> > >> (s/flink/arrow - it is a mistake?)
> >> > >>
> >> > >> Thanks Renjie for your interest. Yes, one of the next step in
> >> parquet-rs
> >> > >> is to integrate with Apache Arrow. Actually we just had a
> discussion
> >> > >> <https://github.com/sunchao/parquet-rs/issues/140> about this
> >> recently.
> >> > >> Feel free to share your comments on the github.
> >> > >>
> >> > >> Best,
> >> > >> Chao
> >> > >>
> >> > >> On Sun, Aug 19, 2018 at 7:39 PM, Renjie Liu <
> liurenjie2...@gmail.com
> >> >
> >> > >> wrote:
> >> > >>
> >> > >>> cc:Sunchao and Any
> >> > >>>
> >> > >>>
> >> > >>> -- Forwarded message -
> >> > >>> From: Uwe L. Korn 
> >> > >>> Date: Sun, Aug 19, 2018 at 5:08 PM
> >> > >>> Subject: Re: [DISCUSS] Rust add adapter for parquet
> >> > >>> To: 
> >> > >>>
> >> > >>>
> >> > >>> Hello,
> >> > >>>
> >> > >>> you might also want to raise this with the
> >> > >>> https://github.com/sunchao/parquet-rs project. The overlap
> between
> >> the
> >> > >>> developers of this project and the Arrow Rust implementation is
> >> quite large
> >> > >>> but still it may make sense to also start a discussion there.
> >> > >>>
> >> > >>> Uwe
> >> > >>>
> >> > >>> On Thu, Aug 16, 2018, at 9:14 AM, Renjie Liu wrote:
> >> > >>> > Hi, all:
> >> > >>> >
> >> > >>> > Now the rust component is approaching a stable state and rust
> >> reader
> >> > >>> for
> >> > >>> > parquet is ready. I think it maybe a good time to start an
> >> adapter for
> >> > >>> > parquet, just like adapter for orc in cpp. How you guys think
> >> about it?
> >> > >>> > --
> >> > >>> > Liu, Renjie
> >> > >>> > Software Engineer, MVAD
> >> > >>> --
> >> > >>> Liu, Renjie
> >> > >>> Software Engineer, MVAD
> >> > >>>
> >> > >>
> >> > >> --
> >> > > Liu, Renjie
> >> > > Software Engineer, MVAD
> >> > >
> >> > --
> >> > Liu, Renjie
> >> > Software Engineer, MVAD
> >>
> >
>
-- 
Renjie Liu
Software Engineer, MVAD


[jira] [Created] (ARROW-3706) [Rust] Add record batch reader trait.

2018-11-06 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-3706:
-

 Summary: [Rust] Add record batch reader trait.
 Key: ARROW-3706
 URL: https://issues.apache.org/jira/browse/ARROW-3706
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust
Reporter: Renjie Liu
Assignee: Renjie Liu
 Fix For: 0.12.0


Add an RecordBatchReader trait.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Rust add adapter for parquet

2018-08-19 Thread Renjie Liu
Hi, Chao:
I've opened an jira issue for that and planning to work on that.

On Mon, Aug 20, 2018 at 11:03 AM Renjie Liu  wrote:

> Yes, it's a mistake, sorry for that
>
>
> On Mon, Aug 20, 2018 at 10:57 AM Chao Sun  wrote:
>
>> (s/flink/arrow - it is a mistake?)
>>
>> Thanks Renjie for your interest. Yes, one of the next step in parquet-rs
>> is to integrate with Apache Arrow. Actually we just had a discussion
>> <https://github.com/sunchao/parquet-rs/issues/140> about this recently.
>> Feel free to share your comments on the github.
>>
>> Best,
>> Chao
>>
>> On Sun, Aug 19, 2018 at 7:39 PM, Renjie Liu 
>> wrote:
>>
>>> cc:Sunchao and Any
>>>
>>>
>>> -- Forwarded message -
>>> From: Uwe L. Korn 
>>> Date: Sun, Aug 19, 2018 at 5:08 PM
>>> Subject: Re: [DISCUSS] Rust add adapter for parquet
>>> To: 
>>>
>>>
>>> Hello,
>>>
>>> you might also want to raise this with the
>>> https://github.com/sunchao/parquet-rs project. The overlap between the
>>> developers of this project and the Arrow Rust implementation is quite large
>>> but still it may make sense to also start a discussion there.
>>>
>>> Uwe
>>>
>>> On Thu, Aug 16, 2018, at 9:14 AM, Renjie Liu wrote:
>>> > Hi, all:
>>> >
>>> > Now the rust component is approaching a stable state and rust reader
>>> for
>>> > parquet is ready. I think it maybe a good time to start an adapter for
>>> > parquet, just like adapter for orc in cpp. How you guys think about it?
>>> > --
>>> > Liu, Renjie
>>> > Software Engineer, MVAD
>>> --
>>> Liu, Renjie
>>> Software Engineer, MVAD
>>>
>>
>> --
> Liu, Renjie
> Software Engineer, MVAD
>
-- 
Liu, Renjie
Software Engineer, MVAD


Re: [DISCUSS] Rust add adapter for parquet

2018-08-19 Thread Renjie Liu
Yes, it's a mistake, sorry for that

On Mon, Aug 20, 2018 at 10:57 AM Chao Sun  wrote:

> (s/flink/arrow - it is a mistake?)
>
> Thanks Renjie for your interest. Yes, one of the next step in parquet-rs
> is to integrate with Apache Arrow. Actually we just had a discussion
> <https://github.com/sunchao/parquet-rs/issues/140> about this recently.
> Feel free to share your comments on the github.
>
> Best,
> Chao
>
> On Sun, Aug 19, 2018 at 7:39 PM, Renjie Liu 
> wrote:
>
>> cc:Sunchao and Any
>>
>>
>> -- Forwarded message -
>> From: Uwe L. Korn 
>> Date: Sun, Aug 19, 2018 at 5:08 PM
>> Subject: Re: [DISCUSS] Rust add adapter for parquet
>> To: 
>>
>>
>> Hello,
>>
>> you might also want to raise this with the
>> https://github.com/sunchao/parquet-rs project. The overlap between the
>> developers of this project and the Arrow Rust implementation is quite large
>> but still it may make sense to also start a discussion there.
>>
>> Uwe
>>
>> On Thu, Aug 16, 2018, at 9:14 AM, Renjie Liu wrote:
>> > Hi, all:
>> >
>> > Now the rust component is approaching a stable state and rust reader for
>> > parquet is ready. I think it maybe a good time to start an adapter for
>> > parquet, just like adapter for orc in cpp. How you guys think about it?
>> > --
>> > Liu, Renjie
>> > Software Engineer, MVAD
>> --
>> Liu, Renjie
>> Software Engineer, MVAD
>>
>
> --
Liu, Renjie
Software Engineer, MVAD


[jira] [Created] (ARROW-3085) [Rust] Add an adapter for parquet.

2018-08-19 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-3085:
-

 Summary: [Rust] Add an adapter for parquet.
 Key: ARROW-3085
 URL: https://issues.apache.org/jira/browse/ARROW-3085
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Renjie Liu
Assignee: Renjie Liu
 Fix For: 0.11.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[DISCUSS] Rust add adapter for parquet

2018-08-16 Thread Renjie Liu
Hi, all:

Now the rust component is approaching a stable state and rust reader for
parquet is ready. I think it maybe a good time to start an adapter for
parquet, just like adapter for orc in cpp. How you guys think about it?
-- 
Liu, Renjie
Software Engineer, MVAD


Re: [ANNOUNCE] New Arrow committers: Andy Grove and Krisztián Szűcs

2018-08-15 Thread Renjie Liu
Congrats Andy, Krisztian!

Andy Grove  于 2018年8月16日周四 上午7:47写道:

> Congrats to you too, Krisztian!
>
> I'm also honored to be part of this project and look forward to
> contributing more in the near future.
>
> Andy.
>
> On Wed, Aug 15, 2018 at 5:29 PM Krisztián Szűcs  >
> wrote:
>
> > I feel honored! Thank You!
> > Congrats Andy!
> >
> > - Krisztian
> >
> > On Aug 15, 2018 6:29 PM, "Wes McKinney"  wrote:
> >
> > On behalf of the Arrow PMC, I'm happy to announce that Andy Grove and
> > Krisztián Szűcs have been invited to be committers on the project.
> >
> > Welcome, and thanks for your contributions!
> >
> >
> > - Wes
> >
>
-- 
Liu, Renjie
Software Engineer, MVAD


Re: Rust tasks for 0.10.0?

2018-07-24 Thread Renjie Liu
+1 for pushing to crates.io.

On Wed, Jul 25, 2018 at 10:52 AM Andy Grove  wrote:

> Hi,
>
> I'm wondering what we should do with the Rust implementation for the 0.10.0
> release. I would like to have an official release pushed to crates.io for
> sure.
>
> Since the Rust implementation is so new there isn't much demand for new
> features yet so I think it is more important to focus on changes that will
> help to encourage adoption, like examples and better documentation.
>
> I'm open to suggestions.
>
> Thanks,
>
> Andy.
>
-- 
Liu, Renjie
Software Engineer, MVAD


[jira] [Created] (ARROW-2852) [Rust] Mark Array as Sync and Send

2018-07-15 Thread Renjie Liu (JIRA)
Renjie Liu created ARROW-2852:
-

 Summary: [Rust] Mark Array as Sync and Send
 Key: ARROW-2852
 URL: https://issues.apache.org/jira/browse/ARROW-2852
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Affects Versions: 0.9.0
Reporter: Renjie Liu
Assignee: Renjie Liu


Since arrays are immutable containers, it would be safe to mark it as Sync and 
Send. This is useful for processing in multithread environments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: rust using nightly channel

2018-04-10 Thread Renjie Liu
Yes, so maybe we need a conditional compilation method so that the user can
choose.

On Tue, Apr 10, 2018 at 9:42 PM Andy Grove <andygrov...@gmail.com> wrote:

> My opinion is that we should continue to support Rust stable since there
> are users who can only use Arrow if it works with Rust stable.
>
> However, maybe it is possible to provide an API so that users can provide
> their own allocators and in that case they could choose to use nightly?
>
> It's a bit more work for us, but gives users more choice.
>
> Also, SIMD and alloc are both going to be stabilized very soon anyway so we
> might not have to wait too long.
>
> Thanks,
>
> Andy.
>
>
>
> On Tue, Apr 10, 2018 at 4:38 AM, Renjie Liu <liurenjie2...@gmail.com>
> wrote:
>
> > Hi:
> > Can we use experimental features in nightly channel? There are many
> useful
> > features that can only be use in nightly channel, e.g. the Alloc api,
> since
> > arrow requires control over low level primitives such as memory
> allocation,
> > simd execution, etc.
> >
> >
> > --
> > Liu, Renjie
> > Software Engineer, MVAD
> >
>
-- 
Liu, Renjie
Software Engineer, MVAD


rust using nightly channel

2018-04-10 Thread Renjie Liu
Hi:
Can we use experimental features in nightly channel? There are many useful
features that can only be use in nightly channel, e.g. the Alloc api, since
arrow requires control over low level primitives such as memory allocation,
simd execution, etc.


-- 
Liu, Renjie
Software Engineer, MVAD


Re: Rust Arrow status and plans for this week

2018-04-10 Thread Renjie Liu
Hello Uwe:
My JIRA id is liurenjie1024 and it seems that I have been given contibutor
permission.

On Tue, Apr 10, 2018 at 3:00 PM Uwe L. Korn  wrote:

> Hello Andy,
>
> this is very exciting. Once we have basic documentation, we should have a
> look at streamlining the release process in the ASF infrastructure so
> making releases is straight-forward. We have a small collection of scripts
> to do this for the main release and the JS release that we should be able
> to adapt to the Rust part of the project. I could simply make the
> respective JIRAs for that or we have a small chat first about the ASF
> release process.
>
> > My next area of interest personally is the IPC mechanism and interop
> > testing with other languages, especially Java.
>
> This is a very important step for all our implementations. We have an
> integration test setup in
> https://github.com/apache/arrow/tree/master/integration where we test the
> compatibility of all Arrow implementations to each other to verify that
> they all have the same understanding of the data structures.
>
> Uwe
>
> On Mon, Apr 9, 2018, at 3:26 PM, Andy Grove wrote:
> > Over the weekend I added preliminary Parquet support to DataFusion (it
> only
> > supports int/float primitives and UTF8 so far). This was possible due to
> > the great work happening with the parquet-rs crate.
> >
> > Integrating this with the current Rust version of Arrow was simple and I
> > have now started running benchmarks (and we now have some benchmark code
> > checked into the Arrow project).
> >
> > Now that the basic functionality is stable enough to support this use
> case
> > I am going to focus on quality this week and start improving unit tests
> and
> > adding documentation.
> >
> > I think we might be at the point where it makes sense to start
> discussing a
> > first official release and maybe a roadmap for the Rust library?
> >
> > My next area of interest personally is the IPC mechanism and interop
> > testing with other languages, especially Java.
> >
> > Thanks,
> >
> > Andy.
>
-- 
Liu, Renjie
Software Engineer, MVAD


Re: Rust Arrow status and plans for this week

2018-04-09 Thread Renjie Liu
Cool!
I'm also trying to use arrow-rs in my project and would like to contribute
to arrow-rs, can anybody give me contributor permission?

On Tue, Apr 10, 2018 at 10:31 AM Jacques Nadeau  wrote:

> Super cool, congrats on the progress!
>
> The IPC/interop is top priority for me as well.
>
> On Mon, Apr 9, 2018 at 6:26 AM, Andy Grove  wrote:
>
> > Over the weekend I added preliminary Parquet support to DataFusion (it
> only
> > supports int/float primitives and UTF8 so far). This was possible due to
> > the great work happening with the parquet-rs crate.
> >
> > Integrating this with the current Rust version of Arrow was simple and I
> > have now started running benchmarks (and we now have some benchmark code
> > checked into the Arrow project).
> >
> > Now that the basic functionality is stable enough to support this use
> case
> > I am going to focus on quality this week and start improving unit tests
> and
> > adding documentation.
> >
> > I think we might be at the point where it makes sense to start
> discussing a
> > first official release and maybe a roadmap for the Rust library?
> >
> > My next area of interest personally is the IPC mechanism and interop
> > testing with other languages, especially Java.
> >
> > Thanks,
> >
> > Andy.
> >
>
-- 
Liu, Renjie
Software Engineer, MVAD