[jira] [Comment Edited] (ARROW-4113) [R] Version number patch broke build

2018-12-25 Thread Hiroaki Yutani (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728608#comment-16728608
 ] 

Hiroaki Yutani edited comment on ARROW-4113 at 12/26/18 2:15 AM:
-

{quote}Is "0.12.0-0" smaller than "0.12.0"?
{quote}
No (I didn't know this until now...).
{code:java}
package_version("0.12.0-0") == package_version("0.12.0")
#> [1] TRUE{code}
One common practice is to add .9000 to the tail of the current released version 
to represent "development version." So, in this case, the version would be 
0.11.0.9000.

c.f. [http://r-pkgs.had.co.nz/description.html#version]

I guess you want to include "0.12.0" in the version string to indicate it's the 
development version of "0.12.0." But, as far as I know, R has no nice way to do 
that. Some possible choices are here:
 # use "0.11.0.9000" for development, and "0.12.0" for release
 # use "0.12.0" both for development and release (c.f. Apache Spark's SparkR 
uses this strategy: 
[https://github.com/apache/spark/commit/9bf397c0e45cb161f3f12f09bd2bf14ff96dc823#diff-06e745873945c43e0e5cf512efa992e9R3])
 # use "0.12.0-0" for development, and "0.12.0-1" for release (c.f 
[rstan|https://cran.r-project.org/src/contrib/Archive/rstan/]'s patch version 
is 1-origin)


was (Author: yutannihilation):
{quote}Is "0.12.0-0" smaller than "0.12.0"?
{quote}
No (I didn't know this until now...).
{code}
package_version("0.12.0-0") == package_version("0.12.0")
#> [1] TRUE{code}
One common practice is to add .9000 to the tail of the current released version 
to represent "development version." So, in this case, the version would be 
0.11.0.9000.

c.f. [http://r-pkgs.had.co.nz/description.html#version]

I guess you want to include "0.12.0" in the version string to indicate it's the 
development version of "0.12.0." But, as far as I know, R has no nice way to do 
that. Some possible choices are here:
 # use "0.11.0.9000" for development, and "0.12.0" for release
 # use "0.12.0" both for development and release
 # use "0.12.0-0" for development, and "0.12.0-1" for release (c.f 
[rstan|https://cran.r-project.org/src/contrib/Archive/rstan/]'s patch version 
is 1-origin)

> [R] Version number patch broke build
> 
>
> Key: ARROW-4113
> URL: https://issues.apache.org/jira/browse/ARROW-4113
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> The patch 
> https://github.com/apache/arrow/commit/385c4384eb0dcc384b443f24765c64e9d6d88d28
>  broke the R build (which is in allowed_failures right now)
> {code}
> Building with: R CMD build 
> 0.22s$ R CMD build  .
> * checking for file ‘./DESCRIPTION’ ... OK
> * preparing ‘arrow’:
> * checking DESCRIPTION meta-information ... ERROR
> Malformed package version.
> See section 'The DESCRIPTION file' in the 'Writing R Extensions'
> manual.
> The command "R CMD build  ." failed and exited with 1 during .
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4113) [R] Version number patch broke build

2018-12-25 Thread Hiroaki Yutani (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728843#comment-16728843
 ] 

Hiroaki Yutani commented on ARROW-4113:
---

Thanks, I too feel option 1. is best among them since many R developers are 
familiar with that manner. 

> [R] Version number patch broke build
> 
>
> Key: ARROW-4113
> URL: https://issues.apache.org/jira/browse/ARROW-4113
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> The patch 
> https://github.com/apache/arrow/commit/385c4384eb0dcc384b443f24765c64e9d6d88d28
>  broke the R build (which is in allowed_failures right now)
> {code}
> Building with: R CMD build 
> 0.22s$ R CMD build  .
> * checking for file ‘./DESCRIPTION’ ... OK
> * preparing ‘arrow’:
> * checking DESCRIPTION meta-information ... ERROR
> Malformed package version.
> See section 'The DESCRIPTION file' in the 'Writing R Extensions'
> manual.
> The command "R CMD build  ." failed and exited with 1 during .
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3324) [Python] Users reporting memory leaks using pa.pq.ParquetDataset

2018-12-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3324:
--
Labels: parquet pull-request-available  (was: parquet)

> [Python] Users reporting memory leaks using pa.pq.ParquetDataset
> 
>
> Key: ARROW-3324
> URL: https://issues.apache.org/jira/browse/ARROW-3324
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: parquet, pull-request-available
> Fix For: 0.12.0
>
>
> See:
> * https://github.com/apache/arrow/issues/2614
> * https://github.com/apache/arrow/issues/2624



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3133) [C++] Logical boolean kernels in kernels/boolean.cc cannot write into preallocated memory

2018-12-25 Thread Micah Kornfield (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728833#comment-16728833
 ] 

Micah Kornfield commented on ARROW-3133:


is the issue this is trying to address the hard-coded:

out->value = ArrayData::Make(boolean(), right_data.length)?

 

Is the fix to check if the output already has an array >= appropriate size and 
use that instead?

> [C++] Logical boolean kernels in kernels/boolean.cc cannot write into 
> preallocated memory
> -
>
> Key: ARROW-3133
> URL: https://issues.apache.org/jira/browse/ARROW-3133
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4114) [C++][DOCUMENTATION]

2018-12-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4114:
--
Labels: pull-request-available  (was: )

> [C++][DOCUMENTATION] 
> -
>
> Key: ARROW-4114
> URL: https://issues.apache.org/jira/browse/ARROW-4114
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Trivial
>  Labels: pull-request-available
>
> make unittest step in the C++ README.md do not work on  fresh ubuntu image 
> without python installed.
> {{Error message from the ctest --output-on-failure indicates it is trying to 
> find python:}}
> {{
> }}{{Running arrow-allocator-test, redirecting output into 
> /home/micahk/arrow/cpp/debug/build/test-logs/arrow-allocator-test.txt 
> (attempt 1/1)}}{{/usr/bin/env: ‘python’: No such file or directory}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4114) [C++][DOCUMENTATION]

2018-12-25 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-4114:
--

 Summary: [C++][DOCUMENTATION] 
 Key: ARROW-4114
 URL: https://issues.apache.org/jira/browse/ARROW-4114
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Micah Kornfield
Assignee: Micah Kornfield


make unittest step in the C++ README.md do not work on  fresh ubuntu image 
without python installed.

{{Error message from the ctest --output-on-failure indicates it is trying to 
find python:}}

{{
}}{{Running arrow-allocator-test, redirecting output into 
/home/micahk/arrow/cpp/debug/build/test-logs/arrow-allocator-test.txt (attempt 
1/1)}}{{/usr/bin/env: ‘python’: No such file or directory}}

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3324) [Python] Users reporting memory leaks using pa.pq.ParquetDataset

2018-12-25 Thread Tanya Schlusser (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728819#comment-16728819
 ] 

Tanya Schlusser commented on ARROW-3324:


I could not reproduce either of the two GitHub issues above, but could identify 
a leak using {{memory_profiler}} on the stackoverflow code (copied 
[this|https://github.com/apache/arrow/blob/master/python/scripts/test_leak.py])

I observed that {{FileSerializer.properties_.use_count()}} increments more than 
expected whenever {{FileSerializer.AppendRowGroup}} is called. The offending 
line is {{FileSerializer.metadata_->AppendRowGroup()}}. I believe that the 
count should only go up once per new row group, instead of once per column plus 
once per row group.

I think the root cause is that in 
{{RowGroupMetaDataBuilder::RowGroupMetaDataBuilderImpl.Finish}}, the vector of 
{{column_builders_}} ought to be reset and cleared each time before it is 
repopulated. I hope to submit a pull request for this even though it may not 
address all of the issues stated here. Since the GitHub issues were about 
memory leaks on "read", and the fix is related only to "write", this 
observation certainly doesn't address everything in this JIRA issue.

Even after the fix I'll post, my memory_profiler code still shows an increase 
in memory use upon additional calls to {{pq.ParquetWriter.write_table}}, which 
I think is OK because the row group is incrementing with each write too. So I 
may be wrong or have still missed something. Regardless, I hope these notes are 
useful to someone.

> [Python] Users reporting memory leaks using pa.pq.ParquetDataset
> 
>
> Key: ARROW-3324
> URL: https://issues.apache.org/jira/browse/ARROW-3324
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: parquet
> Fix For: 0.12.0
>
>
> See:
> * https://github.com/apache/arrow/issues/2614
> * https://github.com/apache/arrow/issues/2624



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4113) [R] Version number patch broke build

2018-12-25 Thread Kouhei Sutou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728809#comment-16728809
 ] 

Kouhei Sutou commented on ARROW-4113:
-

Thanks!
It seems that "1." is suitable for us because we use "9000" approach before I 
changed:
https://github.com/apache/arrow/commit/385c4384eb0dcc384b443f24765c64e9d6d88d28#diff-343ca94945f5031b1858688a69e6d0f7L3

> [R] Version number patch broke build
> 
>
> Key: ARROW-4113
> URL: https://issues.apache.org/jira/browse/ARROW-4113
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> The patch 
> https://github.com/apache/arrow/commit/385c4384eb0dcc384b443f24765c64e9d6d88d28
>  broke the R build (which is in allowed_failures right now)
> {code}
> Building with: R CMD build 
> 0.22s$ R CMD build  .
> * checking for file ‘./DESCRIPTION’ ... OK
> * preparing ‘arrow’:
> * checking DESCRIPTION meta-information ... ERROR
> Malformed package version.
> See section 'The DESCRIPTION file' in the 'Writing R Extensions'
> manual.
> The command "R CMD build  ." failed and exited with 1 during .
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)