Re: Arrow and R benchmark

2018-11-15 Thread Romain François
Right now, most of the code examples is in the unit tests, but this is not 
measuring performance or stressing it. Perhaps you can start from there ? 

Romain

> Le 15 nov. 2018 à 22:16, Wes McKinney  a écrit :
> 
> Adding dev@arrow.apache.org
>> On Thu, Nov 15, 2018 at 4:13 PM Jonathan Chiang  wrote:
>> 
>> Hi,
>> 
>> I would like to contribute to developing benchmark suites for R and Arrow? 
>> What would be the best way to start?
>> 
>> Thanks,
>> Jonathan



Re: Failures bc clang-format

2018-09-27 Thread Romain François
Thanks. I had clang-format already (through brew). I’ll check in the morning 
about version 6. 

It is a bit unfortunate that I can build arrow with what i have but need other 
versions for linting. 

Sorry about the emojis. 

Romain

> Le 27 sept. 2018 à 22:10, Wes McKinney  a écrit :
> 
> Well, you have to install LLVM 6 =)
> 
> if you're on macOS (just guessing, since I can't read your emojis on
> my dilapidated Linux laptop)
> 
> http://releases.llvm.org/6.0.0/clang%2bllvm-6.0.0-x86_64-apple-darwin.tar.xz
> 
> here's a page I found in installing with this method (unless you want
> to build from source)
> 
> https://nacho4d-nacho4d.blogspot.com/2013/11/clang-format.html
> 
> You may need to symlink a clang-format-6.0 alias into your /usr/local/bin
> 
> - Wes
>> On Thu, Sep 27, 2018 at 3:37 PM Romain Francois  wrote:
>> 
>> Getting this:
>> 
>> romain@purrplex ~/git/apache/arrow/r $ ./lint.sh --fix
>> Traceback (most recent call last):
>>  File 
>> "/Users/romain/git/apache/arrow/r/../cpp/build-support/run_clang_format.py", 
>> line 74, in 
>>"-i"] + formatted_filenames)
>>  File "/Users/romain/anaconda3/lib/python3.6/subprocess.py", line 286, in 
>> check_call
>>retcode = call(*popenargs, **kwargs)
>>  File "/Users/romain/anaconda3/lib/python3.6/subprocess.py", line 267, in 
>> call
>>with Popen(*popenargs, **kwargs) as p:
>>  File "/Users/romain/anaconda3/lib/python3.6/subprocess.py", line 709, in 
>> __init__
>>restore_signals, start_new_session)
>>  File "/Users/romain/anaconda3/lib/python3.6/subprocess.py", line 1344, in 
>> _execute_child
>>raise child_exception_type(errno_num, err_msg, err_filename)
>> FileNotFoundError: [Errno 2] No such file or directory: 'clang-format-6.0': 
>> 'clang-format-6.0'
>> 
>> 路‍♂️
>> 
>>> Le 27 sept. 2018 à 18:49, Romain François  a écrit :
>>> 
>>> Thanks. I will do that on a few hours and then i have a small PR (about 
>>> support for logical vectors) ready to go and associated with a jira issue i 
>>> opened this morning.
>>> 
>>> Romain
>>> 
>>>> Le 27 sept. 2018 à 18:46, Wes McKinney  a écrit :
>>>> 
>>>> I checked out your branch and ran r/lint.sh and it printed the following
>>>> 
>>>> https://gist.github.com/wesm/42f1682565ac9737fecc60d12a15927e
>>>> 
>>>> You can run
>>>> 
>>>> ./lint.sh --fix
>>>> 
>>>> to fix the problems
>>>>> On Thu, Sep 27, 2018 at 9:46 AM Romain François  
>>>>> wrote:
>>>>> 
>>>>> I don’t think that’s just that. I sent a new build anyway that is ahead 
>>>>> of the upstream repo.
>>>>> 
>>>>> In any case, there’s probably something i should be doing.
>>>>> 
>>>>> Romain
>>>>> 
>>>>>> Le 27 sept. 2018 à 13:19, Wes McKinney  a écrit :
>>>>>> 
>>>>>> Looks like you need to rebase your branch
>>>>>>> On Thu, Sep 27, 2018 at 7:18 AM Wes McKinney  
>>>>>>> wrote:
>>>>>>> 
>>>>>>> hi Romain,
>>>>>>> 
>>>>>>> I just put this in the README 
>>>>>>> https://github.com/apache/arrow/tree/master/r#development
>>>>>>> 
>>>>>>> - Wes
>>>>>>>> On Thu, Sep 27, 2018 at 7:17 AM Romain François  
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> Is there documentation about what i should be doing to make 
>>>>>>>> clang-format happy? E.g to make this build pass: 
>>>>>>>> https://travis-ci.org/romainfrancois/arrow/builds/434027141
>>>>>>>> 
>>>>>>>> Romain
>>>>> 
>>> 
>> 



Re: Failures bc clang-format

2018-09-27 Thread Romain François
Thanks. I will do that on a few hours and then i have a small PR (about support 
for logical vectors) ready to go and associated with a jira issue i opened this 
morning. 

Romain

> Le 27 sept. 2018 à 18:46, Wes McKinney  a écrit :
> 
> I checked out your branch and ran r/lint.sh and it printed the following
> 
> https://gist.github.com/wesm/42f1682565ac9737fecc60d12a15927e
> 
> You can run
> 
> ./lint.sh --fix
> 
> to fix the problems
>> On Thu, Sep 27, 2018 at 9:46 AM Romain François  wrote:
>> 
>> I don’t think that’s just that. I sent a new build anyway that is ahead of 
>> the upstream repo.
>> 
>> In any case, there’s probably something i should be doing.
>> 
>> Romain
>> 
>>> Le 27 sept. 2018 à 13:19, Wes McKinney  a écrit :
>>> 
>>> Looks like you need to rebase your branch
>>>> On Thu, Sep 27, 2018 at 7:18 AM Wes McKinney  wrote:
>>>> 
>>>> hi Romain,
>>>> 
>>>> I just put this in the README 
>>>> https://github.com/apache/arrow/tree/master/r#development
>>>> 
>>>> - Wes
>>>>> On Thu, Sep 27, 2018 at 7:17 AM Romain François  
>>>>> wrote:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> Is there documentation about what i should be doing to make clang-format 
>>>>> happy? E.g to make this build pass: 
>>>>> https://travis-ci.org/romainfrancois/arrow/builds/434027141
>>>>> 
>>>>> Romain
>> 



Re: Failures bc clang-format

2018-09-27 Thread Romain François
I don’t think that’s just that. I sent a new build anyway that is ahead of the 
upstream repo. 

In any case, there’s probably something i should be doing. 

Romain

> Le 27 sept. 2018 à 13:19, Wes McKinney  a écrit :
> 
> Looks like you need to rebase your branch
>> On Thu, Sep 27, 2018 at 7:18 AM Wes McKinney  wrote:
>> 
>> hi Romain,
>> 
>> I just put this in the README 
>> https://github.com/apache/arrow/tree/master/r#development
>> 
>> - Wes
>>> On Thu, Sep 27, 2018 at 7:17 AM Romain François  wrote:
>>> 
>>> Hello,
>>> 
>>> Is there documentation about what i should be doing to make clang-format 
>>> happy? E.g to make this build pass: 
>>> https://travis-ci.org/romainfrancois/arrow/builds/434027141
>>> 
>>> Romain



Failures bc clang-format

2018-09-27 Thread Romain François
Hello, 

Is there documentation about what i should be doing to make clang-format happy? 
E.g to make this build pass: 
https://travis-ci.org/romainfrancois/arrow/builds/434027141

Romain

Re: Getting some more eyes on the R bindings work

2018-09-21 Thread Romain François
Some of those lines are generated automatically by roxygen, and some of them 
are the license headers ;-) 

Still the pr is substantial, esp compared to the previous one. 

Let me know if I can help the process, e.g. write some notes about how R6 is 
used, the very low level bindings is quite mechanical. 

Romain

> Le 21 sept. 2018 à 17:18, Wes McKinney  a écrit :
> 
> Romain just submitted a ~4500 line R patch
> 
> https://github.com/apache/arrow/pull/2596
> 
> I am going to do my best to give feedback, particularly at the C++
> binding level, but it would be great to get some more eyes on the
> R-level API. As one detail, R6 classes are being used to create the
> wrapper interfaces, see
> 
> https://adv-r.hadley.nz/r6.html
> 
> It should be easier for other people to contribute smaller patches to
> iterate on things after this initial project-bootstrapping work lands.
> 
> Thanks,
> Wes



Re: Lighter build matrix on a language specific fork.

2018-09-06 Thread Romain François
Thanks. I will do that first thing in the morning. 

I just skimmed through them and it’s nothing dramatic, mostly oversights. I 
just can’t act on them right now, #parenting ‍

> Le 7 sept. 2018 à 00:40, Wes McKinney  a écrit :
> 
> OK, if you could address the comments that I left, after that I can merge
> the PR
> 
>> On Thu, Sep 6, 2018 at 8:39 AM Romain François  wrote:
>> 
>> As far as I’m concerned the initial pr is good to go, the intent is to
>> just have an r package that builds against the C++ library and that checks
>> on travis.
>> 
>> Actual code that does stuff will follow. (I have two branches on top of it
>> for later).
>> 
>> But this is bare minimal by design.
>> 
>> Romain
>> 
>>> Le 7 sept. 2018 à 00:06, Wes McKinney  a écrit :
>>> 
>>> Yes, as soon as the initial R PR is in (and the CI scripts aren't
>> changing)
>>> the build will be faster.
>>> 
>>> @Romain how much more work do you want to do on the initial PR? We can
>>> review and merge by end of week if that sounds good
>>> 
>>>> On Thu, Sep 6, 2018 at 6:10 AM Uwe L. Korn  wrote:
>>>> 
>>>> The problem could be that it checks against master and you will probably
>>>> have changes for R in the ci/ directory. Changes in that directory will
>>>> trigger a build for the full matrix. So to get the build simple and
>> fast,
>>>> we should get the ci/ changes for R into master soon.
>>>> 
>>>> Uwe
>>>> 
>>>>>> On Thu, Sep 6, 2018, at 3:04 PM, Antoine Pitrou wrote:
>>>>>> 
>>>>>> Le 06/09/2018 à 15:03, Romain François a écrit :
>>>>>> I must do something wrong then because it builds them all, all the
>>>> time 路‍♂️.
>>>>> 
>>>>> Can you show an example?
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Not a big deal, it’s only about 10 jobs, and i really only care about
>>>> the r job and the only doing the  analysis.
>>>>>> 
>>>>>> Commenting out may not be very practical as the plan is to submit
>>>> small pull requests, so it’s almost guaranteed i’ll forget to uncomment
>>>> about 50% of the time.
>>>>>> 
>>>>>>> Le 6 sept. 2018 à 14:48, Antoine Pitrou  a écrit
>>>> :
>>>>>>> 
>>>>>>> 
>>>>>>> Our CI harness will already fast-exit in jobs that are not affected
>> by
>>>>>>> the current changes (if you change only the R directory, C++ jobs
>> will
>>>>>>> exit early).
>>>>>>> 
>>>>>>> If you want it to be even faster, your best bet is to temporarily
>>>>>>> comment out job entries in .travis.yml.
>>>>>>> 
>>>>>>> Regards
>>>>>>> 
>>>>>>> Antoine.
>>>>>>> 
>>>>>>> 
>>>>>>>> Le 06/09/2018 à 14:26, Romain François a écrit :
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> Is there a way to have a lighter build matrix on travis, perhaps
>>>> based on the branch name, for example when working on the r bindings and
>>>> not touching anything else, having only the r job to be triggered would
>>>> make it faster for travis.
>>>>>>>> 
>>>>>>>> For example when working on r features i would typically start the
>>>> branch name with « r-»
>>>>>>>> 
>>>>>>>> Romain
>>>>>>>> 
>>>>>> 
>>>> 
>> 
>> 



Re: Lighter build matrix on a language specific fork.

2018-09-06 Thread Romain François
As far as I’m concerned the initial pr is good to go, the intent is to just 
have an r package that builds against the C++ library and that checks on 
travis. 

Actual code that does stuff will follow. (I have two branches on top of it for 
later). 

But this is bare minimal by design. 

Romain

> Le 7 sept. 2018 à 00:06, Wes McKinney  a écrit :
> 
> Yes, as soon as the initial R PR is in (and the CI scripts aren't changing)
> the build will be faster.
> 
> @Romain how much more work do you want to do on the initial PR? We can
> review and merge by end of week if that sounds good
> 
>> On Thu, Sep 6, 2018 at 6:10 AM Uwe L. Korn  wrote:
>> 
>> The problem could be that it checks against master and you will probably
>> have changes for R in the ci/ directory. Changes in that directory will
>> trigger a build for the full matrix. So to get the build simple and fast,
>> we should get the ci/ changes for R into master soon.
>> 
>> Uwe
>> 
>>> On Thu, Sep 6, 2018, at 3:04 PM, Antoine Pitrou wrote:
>>> 
>>>> Le 06/09/2018 à 15:03, Romain François a écrit :
>>>> I must do something wrong then because it builds them all, all the
>> time 路‍♂️.
>>> 
>>> Can you show an example?
>>> 
>>> 
>>>> 
>>>> Not a big deal, it’s only about 10 jobs, and i really only care about
>> the r job and the only doing the  analysis.
>>>> 
>>>> Commenting out may not be very practical as the plan is to submit
>> small pull requests, so it’s almost guaranteed i’ll forget to uncomment
>> about 50% of the time.
>>>> 
>>>>> Le 6 sept. 2018 à 14:48, Antoine Pitrou  a écrit
>> :
>>>>> 
>>>>> 
>>>>> Our CI harness will already fast-exit in jobs that are not affected by
>>>>> the current changes (if you change only the R directory, C++ jobs will
>>>>> exit early).
>>>>> 
>>>>> If you want it to be even faster, your best bet is to temporarily
>>>>> comment out job entries in .travis.yml.
>>>>> 
>>>>> Regards
>>>>> 
>>>>> Antoine.
>>>>> 
>>>>> 
>>>>>> Le 06/09/2018 à 14:26, Romain François a écrit :
>>>>>> Hello,
>>>>>> 
>>>>>> Is there a way to have a lighter build matrix on travis, perhaps
>> based on the branch name, for example when working on the r bindings and
>> not touching anything else, having only the r job to be triggered would
>> make it faster for travis.
>>>>>> 
>>>>>> For example when working on r features i would typically start the
>> branch name with « r-»
>>>>>> 
>>>>>> Romain
>>>>>> 
>>>> 
>> 



Re: Lighter build matrix on a language specific fork.

2018-09-06 Thread Romain François
I must do something wrong then because it builds them all, all the time 路‍♂️.

Not a big deal, it’s only about 10 jobs, and i really only care about the r job 
and the only doing the  analysis. 

Commenting out may not be very practical as the plan is to submit small pull 
requests, so it’s almost guaranteed i’ll forget to uncomment about 50% of the 
time. 

> Le 6 sept. 2018 à 14:48, Antoine Pitrou  a écrit :
> 
> 
> Our CI harness will already fast-exit in jobs that are not affected by
> the current changes (if you change only the R directory, C++ jobs will
> exit early).
> 
> If you want it to be even faster, your best bet is to temporarily
> comment out job entries in .travis.yml.
> 
> Regards
> 
> Antoine.
> 
> 
>> Le 06/09/2018 à 14:26, Romain François a écrit :
>> Hello, 
>> 
>> Is there a way to have a lighter build matrix on travis, perhaps based on 
>> the branch name, for example when working on the r bindings and not touching 
>> anything else, having only the r job to be triggered would make it faster 
>> for travis. 
>> 
>> For example when working on r features i would typically start the branch 
>> name with « r-»
>> 
>> Romain
>> 



Lighter build matrix on a language specific fork.

2018-09-06 Thread Romain François
Hello, 

Is there a way to have a lighter build matrix on travis, perhaps based on the 
branch name, for example when working on the r bindings and not touching 
anything else, having only the r job to be triggered would make it faster for 
travis. 

For example when working on r features i would typically start the branch name 
with « r-»

Romain


Re: R/arrow update

2018-03-21 Thread Romain François
That sounds good. I’ll make a pull request of what I have once I have something 
useful in the readme. 

Things like build are not dealt with at the moment so it might be that this 
only works on macOS or even (don’t think so) only on my . 

As long as it’s clearly established that this is wip and that it might entirely 
change, for sure let’s merge patches to master.

JIRA is new to me, I usually work with github issues, so I’ll probably need 
some guidance. 

Romain

> Le 20 mars 2018 à 23:30, Wes McKinney  a écrit :
> 
> hi Romain,
> 
> Cool! I would suggest that we proceed in one of two ways:
> 
> * Start merging R patches to master (what I would prefer)
> * Merge patches into an r-devel branch while the R bindings initiative
> is in early stages
> 
> I don't really see any benefits to hiding early-stage code in a
> branch; the README for R should clearly indicate that the API is
> experimental. I think it would be better for the code to start going
> into the Arrow project (rather than staying in your personal branch)
> for a few reasons:
> 
> * More opportunities for the community to participate
> * More visible progress / transparency into what is going on
> * You will earn karma in the Apache project and be on your way to
> becoming a committer
> * Opportunities for code review from other C++ developers on use of
> the Arrow APIs, and opportunities for improvement
> * Incremental IP / licensing oversight (this gets harder when the
> patches get bigger)
> * Help with roadmapping / enumerating work to be done
> 
> On that last note, I would recommend beginning to liberally create
> JIRAs as you think of things that need to be done to build first class
> R support for Arrow. JIRA is the simplest way to develop the roadmap
> organically, it doesn't need to be anything formal.
> 
> Thanks!
> Wes
> 
>> On Tue, Mar 20, 2018 at 12:04 PM, Romain Francois  wrote:
>> Hello,
>> 
>> Today is Tuesday, so that's the day I work on porting arrow to R. This week, 
>> I've continued some of the work from last week, still following the steps of 
>> the python front end as documented here: 
>> https://arrow.apache.org/docs/python/data.html#type-metadata 
>> 
>> 
>> Things are starting to materialize, and I try to give it an R feel.
>> 
>>> int32()
>> DataType(int32)
>>> 
>>> float64()
>> DataType(double)
>>> 
>>> struct( x = int32(), y = float64(), d1 = date32() )
>> StructType(struct)
>>> 
>>> schema( x = int32(), y = float64(), d1 = date32() )
>> x: int32
>> y: double
>> d1: date32[day]
>> 
>> 
>> This is not that interesting, but it sets a nice premise for the future.
>> 
>> Quick ones:
>> - are there examples of uses of pyarrow.union ?
>> - how does pyarrow.array dispatches to the right array type ? And perhaps 
>> more generally, how do I know what's inside the function ?
>> 
> pa.array([1, 2, None, 3])
>> 
>> [
>>  1,
>>  2,
>>  NA,
>>  3
>> ]
> 
> pa.array
>> 
>> 
>> 
>> Romain
>> 
>> 



Re: gReetings

2018-03-14 Thread Romain François
That sounds interesting, i am open to join yet another slack team. 

The way I structure my time these days, I dedicate one day a week to the r 
arrow front end. Right now this is tuesdays. 

I have other strong commitment for other open projects on mondays and fridays, 
and I keep the two other days for miscellaneous projects mostly around R, my 
aim is to community fund these two days, hence my patreon page: 
https://www.patreon.com/romainfrancois

Even though I might do arrow things outside of tuesdays, this is a much weaker 
« might ». Just trying to manage expectations about my time ...

Romain

> Le 14 mars 2018 à 18:19, Aneesh Karve <ane...@quiltdata.io> a écrit :
> 
> Hi Romain,
> 
> I have a list of about ten R developers and users who are interested in
> Arrow bindings for R, as well as a Slack channel where we can discuss
> things further. Feel free to email me and I'll connect you. The list arose
> as we were looking to bring R support to Quilt packages; and we explored
> the existing Feather bindings, Rcpp and other options.
> ᐧ
> 
>> On Tue, Mar 13, 2018 at 2:45 AM, Romain Francois <rom...@purrple.cat> wrote:
>> 
>> Hello,
>> 
>> I just wanted to introduce myself here. I’m Romain François, mostly
>> involved in making tools for R. My track record includes being an author of
>> Rcpp and dplyr.
>> 
>> I will be working with Wes’s guidance on making an R front end for arrow.
>> Initially that means going through the C++ api and perhaps see how things
>> have been implemented in the python front end. For the foreseeable future,
>> I’ll be spending Tuesdays on this. Some words from last week.
>> https://purrple.cat/blog/2018/03/07/arrow-rrrow-rcher-spurrrow/ <
>> https://purrple.cat/blog/2018/03/07/arrow-rrrow-rcher-spurrrow/>
>> 
>> Are there any resources that could be relevant, e.g. some document about
>> how another front end was made ?
>> Are there other R people here ?
>> 
>> Regards,
>> 
>> Romain
> 
> 
> 
> 
> -- 
> 
> 
> Aneesh Karve | 765-360-9348 | LinkedIn <http://linkedin.com/in/aneeshkarve> |
> Twitter <https://twitter.com/akarve>
> 
> 
> quiltdata.com | Manage data like code
> <https://blog.quiltdata.com/its-time-to-manage-data-like-source-code-3df04cd312b8>


Re: gReetings

2018-03-13 Thread Romain François
Hi, 

« arrow » is fine of course, i only coined « rrrow » for fun but then it kind 
of grows on me. 

Do I have push rights to the main repo ? I guess I’ll work at lesst in branches 
esp now that I’m essentially just playing with the api. 

Btw, i did stumble into the other python arro last week on my documentatio 
hunt. 

Romain

> Le 13 mars 2018 à 15:44, Wes McKinney <wesmck...@gmail.com> a écrit :
> 
> hi Romain,
> 
> welcome! I'm excited to see some progress on an Arrow library for R.
> Feel free to make pull requests into a new "R/" top-level directory;
> the work does not need to be polished
> 
> My vote would be to simply call the R library "arrow" since the real
> estate seems to exist. The main reason it's "pyarrow" in Python is
> that "arrow" was already taken (https://pypi.python.org/pypi/arrow).
> Others may have opinions
> 
> - Wes
> 
>> On Tue, Mar 13, 2018 at 5:45 AM, Romain Francois <rom...@purrple.cat> wrote:
>> Hello,
>> 
>> I just wanted to introduce myself here. I’m Romain François, mostly involved 
>> in making tools for R. My track record includes being an author of Rcpp and 
>> dplyr.
>> 
>> I will be working with Wes’s guidance on making an R front end for arrow. 
>> Initially that means going through the C++ api and perhaps see how things 
>> have been implemented in the python front end. For the foreseeable future, 
>> I’ll be spending Tuesdays on this. Some words from last week.
>> https://purrple.cat/blog/2018/03/07/arrow-rrrow-rcher-spurrrow/ 
>> <https://purrple.cat/blog/2018/03/07/arrow-rrrow-rcher-spurrrow/>
>> 
>> Are there any resources that could be relevant, e.g. some document about how 
>> another front end was made ?
>> Are there other R people here ?
>> 
>> Regards,
>> 
>> Romain