[jira] [Created] (ARROW-17253) Pyarrow array crashes the interpreter when encounter 0 division error

2022-07-29 Thread Li Jin (Jira)
Li Jin created ARROW-17253:
--

 Summary: Pyarrow array crashes the interpreter when encounter 0 
division error  
 Key: ARROW-17253
 URL: https://issues.apache.org/jira/browse/ARROW-17253
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Li Jin


{code:java}
pa.array(1 // 0 for x in range(10), size=10){code}
This would crash the python interpreter 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-16716) [Benchmarks] Create Projection benchmark for Acero

2022-06-01 Thread Li Jin (Jira)
Li Jin created ARROW-16716:
--

 Summary: [Benchmarks] Create Projection benchmark for Acero
 Key: ARROW-16716
 URL: https://issues.apache.org/jira/browse/ARROW-16716
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Li Jin






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16083) [C++] Implement AsofJoin execution node

2022-03-31 Thread Li Jin (Jira)
Li Jin created ARROW-16083:
--

 Summary: [C++] Implement AsofJoin execution node
 Key: ARROW-16083
 URL: https://issues.apache.org/jira/browse/ARROW-16083
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Li Jin






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15532) [C++] Remove unused warning for StringClassifyDoc

2022-02-02 Thread Li Jin (Jira)
Li Jin created ARROW-15532:
--

 Summary: [C++] Remove unused warning for StringClassifyDoc
 Key: ARROW-15532
 URL: https://issues.apache.org/jira/browse/ARROW-15532
 Project: Apache Arrow
  Issue Type: Task
Reporter: Li Jin






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-1431) [Java] JsonFileReader doesn't intialize some vectors approperately

2019-04-17 Thread Li Jin (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820204#comment-16820204
 ] 

Li Jin commented on ARROW-1431:
---

This issue is fixed by https://github.com/apache/arrow/pull/1290

> [Java] JsonFileReader doesn't intialize some vectors approperately 
> ---
>
> Key: ARROW-1431
> URL: https://issues.apache.org/jira/browse/ARROW-1431
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Li Jin
>Priority: Major
>
> One example is for ListVector, the JsonFileReader sets the validity, offset 
> and data, but doesn't set `lastSet` variable in the ListVector instance.
> ArrowFileReader works correct before it invokes `loadFieldBuffers` and 
> intialize `lastSet` correctly:
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/complex/ListVector.java#L120
> This doesn't break integration tests but can cause weird bugs when people 
> call methods on the vector read from json.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1425) [Python] Document semantic differences between Spark timestamps and Arrow timestamps

2019-02-06 Thread Li Jin (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762024#comment-16762024
 ] 

Li Jin commented on ARROW-1425:
---

[~emkornfi...@gmail.com] Feel free to finish it up.

> [Python] Document semantic differences between Spark timestamps and Arrow 
> timestamps
> 
>
> Key: ARROW-1425
> URL: https://issues.apache.org/jira/browse/ARROW-1425
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Micah Kornfield
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> The way that Spark treats non-timezone-aware timestamps as session local can 
> be problematic when using pyarrow which may view the data coming from 
> toPandas() as time zone naive (but with fields as though it were UTC, not 
> session local). We should document carefully how to properly handle the data 
> coming from Spark to avoid problems.
> cc [~bryanc] [~holdenkarau]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3396) [Java] VectorSchemaRoot.create(schema, allocator) doesn't create dictionary encoded vector correctly

2018-10-19 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-3396:
--
Summary: [Java] VectorSchemaRoot.create(schema, allocator) doesn't create 
dictionary encoded vector correctly  (was: VectorSchemaRoot.create(schema, 
allocator) doesn't create dictionary encoded vector correctly)

> [Java] VectorSchemaRoot.create(schema, allocator) doesn't create dictionary 
> encoded vector correctly
> 
>
> Key: ARROW-3396
> URL: https://issues.apache.org/jira/browse/ARROW-3396
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Li Jin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3566) Clarify that the type of dictionary encoded field should be the encoded(index) type

2018-10-19 Thread Li Jin (JIRA)
Li Jin created ARROW-3566:
-

 Summary: Clarify that the type of dictionary encoded field should 
be the encoded(index) type
 Key: ARROW-3566
 URL: https://issues.apache.org/jira/browse/ARROW-3566
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Li Jin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3493) [Java] Document BOUNDS_CHECKING_ENABLED

2018-10-11 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-3493:
--
Description: 
According to [~atrivedi], BOUNDS_CHECKING_ENABLED has significant implication 
on performance.

We should document this better and maybe revisit the default value. 

https://github.com/apache/arrow/blob/master/java/memory/src/main/java/org/apache/arrow/memory/BoundsChecking.java

  was:
According to [~atrivedi], BOUNDS_CHECKING_ENABLED has significant implication 
on performance.

We should document this better and maybe revisit the default value. 


> [Java] Document BOUNDS_CHECKING_ENABLED
> ---
>
> Key: ARROW-3493
> URL: https://issues.apache.org/jira/browse/ARROW-3493
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 0.11.0
>Reporter: Li Jin
>Priority: Major
>
> According to [~atrivedi], BOUNDS_CHECKING_ENABLED has significant implication 
> on performance.
> We should document this better and maybe revisit the default value. 
> https://github.com/apache/arrow/blob/master/java/memory/src/main/java/org/apache/arrow/memory/BoundsChecking.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3497) [Java] Add user documentation for achieving better performance

2018-10-11 Thread Li Jin (JIRA)
Li Jin created ARROW-3497:
-

 Summary: [Java] Add user documentation for achieving better 
performance
 Key: ARROW-3497
 URL: https://issues.apache.org/jira/browse/ARROW-3497
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation, Java
Affects Versions: 0.11.0
Reporter: Li Jin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3496) [Java] Add microbenchmark code to Java

2018-10-11 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-3496:
--
Issue Type: Improvement  (was: Task)

> [Java] Add microbenchmark code to Java
> --
>
> Key: ARROW-3496
> URL: https://issues.apache.org/jira/browse/ARROW-3496
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 0.11.0
>Reporter: Li Jin
>Priority: Major
>
> [~atrivedi] has done some microbenchmarking with the Java API. Let's consider 
> adding them to the codebase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3495) [Java] Optimize bit operations performance

2018-10-11 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-3495:
--
Issue Type: Improvement  (was: Task)

> [Java] Optimize bit operations performance
> --
>
> Key: ARROW-3495
> URL: https://issues.apache.org/jira/browse/ARROW-3495
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 0.11.0
>Reporter: Li Jin
>Priority: Major
>
> From [~atrivedi]'s benchmark finding:
> 2) Materialize values from Validity and Value direct buffers instead of
> calling getInt() function on the IntVector. This is implemented as a new
> Unsafe reader type (
> [https://github.com/animeshtrivedi/benchmarking-arrow/blob/master/src/main/java/com/github/animeshtrivedi/benchmark/ArrowReaderUnsafe.java#L31]
> )
> 3) Optimize bitmap operation to check if a bit is set or not (
> [https://github.com/animeshtrivedi/benchmarking-arrow/blob/master/src/main/java/com/github/animeshtrivedi/benchmark/ArrowReaderUnsafe.java#L23]
> )



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3493) [Java] Document BOUNDS_CHECKING_ENABLED

2018-10-11 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-3493:
--
Issue Type: Improvement  (was: Task)

> [Java] Document BOUNDS_CHECKING_ENABLED
> ---
>
> Key: ARROW-3493
> URL: https://issues.apache.org/jira/browse/ARROW-3493
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 0.11.0
>Reporter: Li Jin
>Priority: Major
>
> According to [~atrivedi], BOUNDS_CHECKING_ENABLED has significant implication 
> on performance.
> We should document this better and maybe revisit the default value. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3495) [Java] Optimize bit operations performance

2018-10-11 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-3495:
--
Summary: [Java] Optimize bit operations performance  (was: Optimize bit 
operations performance)

> [Java] Optimize bit operations performance
> --
>
> Key: ARROW-3495
> URL: https://issues.apache.org/jira/browse/ARROW-3495
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Affects Versions: 0.11.0
>Reporter: Li Jin
>Priority: Major
>
> From [~atrivedi]'s benchmark finding:
> 2) Materialize values from Validity and Value direct buffers instead of
> calling getInt() function on the IntVector. This is implemented as a new
> Unsafe reader type (
> [https://github.com/animeshtrivedi/benchmarking-arrow/blob/master/src/main/java/com/github/animeshtrivedi/benchmark/ArrowReaderUnsafe.java#L31]
> )
> 3) Optimize bitmap operation to check if a bit is set or not (
> [https://github.com/animeshtrivedi/benchmarking-arrow/blob/master/src/main/java/com/github/animeshtrivedi/benchmark/ArrowReaderUnsafe.java#L23]
> )



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3496) Add microbenchmark code to Java

2018-10-11 Thread Li Jin (JIRA)
Li Jin created ARROW-3496:
-

 Summary: Add microbenchmark code to Java
 Key: ARROW-3496
 URL: https://issues.apache.org/jira/browse/ARROW-3496
 Project: Apache Arrow
  Issue Type: Task
  Components: Java
Affects Versions: 0.11.0
Reporter: Li Jin


[~atrivedi] has done some microbenchmarking with the Java API. Let's consider 
adding them to the codebase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3496) [Java] Add microbenchmark code to Java

2018-10-11 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-3496:
--
Summary: [Java] Add microbenchmark code to Java  (was: Add microbenchmark 
code to Java)

> [Java] Add microbenchmark code to Java
> --
>
> Key: ARROW-3496
> URL: https://issues.apache.org/jira/browse/ARROW-3496
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Affects Versions: 0.11.0
>Reporter: Li Jin
>Priority: Major
>
> [~atrivedi] has done some microbenchmarking with the Java API. Let's consider 
> adding them to the codebase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3495) Optimize bit operations performance

2018-10-11 Thread Li Jin (JIRA)
Li Jin created ARROW-3495:
-

 Summary: Optimize bit operations performance
 Key: ARROW-3495
 URL: https://issues.apache.org/jira/browse/ARROW-3495
 Project: Apache Arrow
  Issue Type: Task
  Components: Java
Affects Versions: 0.11.0
Reporter: Li Jin


>From [~atrivedi]'s benchmark finding:

2) Materialize values from Validity and Value direct buffers instead of
calling getInt() function on the IntVector. This is implemented as a new
Unsafe reader type (
[https://github.com/animeshtrivedi/benchmarking-arrow/blob/master/src/main/java/com/github/animeshtrivedi/benchmark/ArrowReaderUnsafe.java#L31]
)

3) Optimize bitmap operation to check if a bit is set or not (
[https://github.com/animeshtrivedi/benchmarking-arrow/blob/master/src/main/java/com/github/animeshtrivedi/benchmark/ArrowReaderUnsafe.java#L23]
)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3493) Document BOUNDS_CHECKING_ENABLED

2018-10-11 Thread Li Jin (JIRA)
Li Jin created ARROW-3493:
-

 Summary: Document BOUNDS_CHECKING_ENABLED
 Key: ARROW-3493
 URL: https://issues.apache.org/jira/browse/ARROW-3493
 Project: Apache Arrow
  Issue Type: Task
  Components: Java
Affects Versions: 0.11.0
Reporter: Li Jin


According to [~atrivedi], BOUNDS_CHECKING_ENABLED has significant implication 
on performance.

We should document this better and maybe revisit the default value. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2310) Source release scripts fail with Java8

2018-10-01 Thread Li Jin (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634699#comment-16634699
 ] 

Li Jin commented on ARROW-2310:
---

[~wesmckinn] I can run 00-prepare and build/sign the java artifacts. The rest 
of the release steps don't work for me. 

> Source release scripts fail with Java8
> --
>
> Key: ARROW-2310
> URL: https://issues.apache.org/jira/browse/ARROW-2310
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It's getting harder and harder to install Java7 these days. On a new install 
> of Ubuntu 16.04 I am not even sure how to get Oracle's Java7 installed 
> (though Java8 can be installed through a PPA).
> In lieu of fixing all the javadoc problems, it would be great if there was 
> some other workaround to build the release on Java8



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3394) [Java] Remove duplicate dependency entry in Flight

2018-10-01 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin resolved ARROW-3394.
---
   Resolution: Fixed
Fix Version/s: 0.11.0

Issue resolved by pull request 2679
[https://github.com/apache/arrow/pull/2679]

> [Java] Remove duplicate dependency entry in Flight
> --
>
> Key: ARROW-3394
> URL: https://issues.apache.org/jira/browse/ARROW-3394
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a duplicate dependency entry in the Arrow Flight pom for grpc.netty 
> which leads to the follow warning
> {noformat}
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.arrow:arrow-flight:jar:0.11.0-SNAPSHOT
> [WARNING] 'dependencies.dependency.(groupId:artifactId:type:classifier)' must 
> be unique: io.grpc:grpc-netty:jar -> duplicate declaration of version 
> ${dep.grpc.version} @ org.apache.arrow:arrow-flight:[unknown-version], 
> /home/bryan/git/arrow/java/flight/pom.xml, line 55, column 17
> [WARNING] 
> [WARNING] It is highly recommended to fix these problems because they 
> threaten the stability of your build.
> [WARNING] 
> [WARNING] For this reason, future Maven versions might no longer support 
> building such malformed projects.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2310) Source release scripts fail with Java8

2018-10-01 Thread Li Jin (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634442#comment-16634442
 ] 

Li Jin commented on ARROW-2310:
---

[~wesmckinn] We are already building for Java 8 per 
[https://github.com/apache/arrow/pull/1936] but I haven't run the release 
scripts (I assume there will be permission issue for me).

 

 

> Source release scripts fail with Java8
> --
>
> Key: ARROW-2310
> URL: https://issues.apache.org/jira/browse/ARROW-2310
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It's getting harder and harder to install Java7 these days. On a new install 
> of Ubuntu 16.04 I am not even sure how to get Oracle's Java7 installed 
> (though Java8 can be installed through a PPA).
> In lieu of fixing all the javadoc problems, it would be great if there was 
> some other workaround to build the release on Java8



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3396) VectorSchemaRoot.create(schema, allocator) doesn't create dictionary encoded vector correctly

2018-10-01 Thread Li Jin (JIRA)
Li Jin created ARROW-3396:
-

 Summary: VectorSchemaRoot.create(schema, allocator) doesn't create 
dictionary encoded vector correctly
 Key: ARROW-3396
 URL: https://issues.apache.org/jira/browse/ARROW-3396
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Li Jin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3175) Arrow Java: Upgrade to official FlatBuffers release (Flatbuffers incompatibility)

2018-09-05 Thread Li Jin (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604403#comment-16604403
 ] 

Li Jin commented on ARROW-3175:
---

Looks like something we should fix. I will take a look.

> Arrow Java: Upgrade to official FlatBuffers release (Flatbuffers 
> incompatibility)
> -
>
> Key: ARROW-3175
> URL: https://issues.apache.org/jira/browse/ARROW-3175
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 0.10.0
>Reporter: Alex Black
>Priority: Blocker
> Fix For: 0.11.0
>
>
> Arrow Java currently uses an unofficial flatbuffers dependency - 
> com.vlkan:flatbuffers:
>  [https://github.com/apache/arrow/blob/master/java/pom.xml#L481-L485]
> The likely motivation here is that previously, no Java flatbuffers 
> implementation was available on maven central.
>  [https://github.com/vy/flatbuffers]
>  > Unfortunately, FlatBuffers project does not publish any artifacts to the 
> Maven Central Repository
> However, this is no longer the case:
>  
> [https://search.maven.org/search?q=g:com.google.flatbuffers%20AND%20a:flatbuffers-java=gav]
> The flatbuffers version used in Arrow java is a nearly 3-year-old snapshot, 
> not even a version of an official release: 
> [https://github.com/vy/flatbuffers#usage]
> The main problem is that this version of flatbuffers is not compatible with 
> the official releases of flatbuffers.
>  For example, we use the official flatbuffers releases in ND4J and 
> Deeplearning4j: [https://github.com/deeplearning4j/deeplearning4j]
> Running Arrow with an official flatbuffers library on the classpath results 
> in issues such as:
> {noformat}
> java.lang.NoSuchMethodError: 
> com.google.flatbuffers.FlatBufferBuilder.createString(Ljava/lang/String;)I
>  at org.apache.arrow.vector.types.pojo.Field.getField(Field.java:154)
>  at org.apache.arrow.vector.types.pojo.Schema.getSchema(Schema.java:145)
>  at 
> org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:124)
>  at 
> org.apache.arrow.vector.ipc.ArrowWriter.ensureStarted(ArrowWriter.java:136)
>  at org.apache.arrow.vector.ipc.ArrowWriter.start(ArrowWriter.java:97)
>  at FlatBuffersDependencyIssue.test(FlatBuffersDependencyIssue.java:56)
> {noformat}
>  
> Simply excluding the com.vlkan:flatbuffers dependency in lieu of an official 
> flatbuffers release is not a solution (same exception as above) and we aren't 
> prepared to downgrade all of our projects to use the flatbuffers version that 
> Arrow currently requires.
>  Consequently, this is a major issue that prevents us using Arrow in our 
> libraries.
> I have prepared a simple repository to reproduce this issue, if required: 
> [https://github.com/AlexDBlack/arrowflatbufferstest]
> Is there a reason for using this particular version of flatbuffers, and if 
> not, can Arrow java use an official release of flatbuffers instead?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3115) [Java] Style Checks - Fix import ordering

2018-08-31 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin resolved ARROW-3115.
---
   Resolution: Fixed
Fix Version/s: 0.11.0

Issue resolved by pull request 2469
[https://github.com/apache/arrow/pull/2469]

> [Java] Style Checks - Fix import ordering
> -
>
> Key: ARROW-3115
> URL: https://issues.apache.org/jira/browse/ARROW-3115
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Fix import ordering according to checkstyle



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3111) [Java] Enable changing default logging level when running tests

2018-08-24 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin resolved ARROW-3111.
---
   Resolution: Fixed
Fix Version/s: 0.11.0

Issue resolved by pull request 2465
[https://github.com/apache/arrow/pull/2465]

> [Java] Enable changing default logging level when running tests
> ---
>
> Key: ARROW-3111
> URL: https://issues.apache.org/jira/browse/ARROW-3111
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently tests use the logback logger which has a default level of DEBUG. We 
> should provide a way to change this level so that tests can be run without 
> seeing a ton of DEBUG logging messages, if needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3003) [Doc] Enable Java doc in dev/gen_apidocs/create_documents.sh

2018-08-06 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-3003:
--
Component/s: Documentation
Summary: [Doc] Enable Java doc in dev/gen_apidocs/create_documents.sh  
(was: Unable Java doc in dev/gen_apidocs/create_documents.sh)

> [Doc] Enable Java doc in dev/gen_apidocs/create_documents.sh
> 
>
> Key: ARROW-3003
> URL: https://issues.apache.org/jira/browse/ARROW-3003
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation
>Reporter: Li Jin
>Priority: Major
>
> This is currently disabled but I have verified it works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3003) Unable Java doc in dev/gen_apidocs/create_documents.sh

2018-08-06 Thread Li Jin (JIRA)
Li Jin created ARROW-3003:
-

 Summary: Unable Java doc in dev/gen_apidocs/create_documents.sh
 Key: ARROW-3003
 URL: https://issues.apache.org/jira/browse/ARROW-3003
 Project: Apache Arrow
  Issue Type: Task
Reporter: Li Jin


This is currently disabled but I have verified it works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2922) [Release] Make python command name customizable

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2922:
--
Component/s: Packaging

> [Release] Make python command name customizable
> ---
>
> Key: ARROW-2922
> URL: https://issues.apache.org/jira/browse/ARROW-2922
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {{python3}} is used for Python 3 on Debian GNU/Linux.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2301) [Python] Add source distribution publishing instructions to package / release management documentation

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin reassigned ARROW-2301:
-

Assignee: Uwe L. Korn

> [Python] Add source distribution publishing instructions to package / release 
> management documentation
> --
>
> Key: ARROW-2301
> URL: https://issues.apache.org/jira/browse/ARROW-2301
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We wish to start publishing source tarballs for Python on PyPI



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2346) [Python] PYARROW_CXXFLAGS doesn't accept multiple options

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin reassigned ARROW-2346:
-

Assignee: Antoine Pitrou

> [Python] PYARROW_CXXFLAGS doesn't accept multiple options
> -
>
> Key: ARROW-2346
> URL: https://issues.apache.org/jira/browse/ARROW-2346
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Let's say I want to enable multiple warnings. I try:
> {code:bash}
> PYARROW_CXXFLAGS="-Wextra -Wconversion" python setup.py build
> {code}
> and get the following error:
> {code:bash}
> [ 22%] Building CXX object CMakeFiles/plasma.dir/plasma.cxx.o
> g++-4.9: error: unrecognized command line option '-Wextra -Wconversion'
> {code}
> For some reason it seems command expansion doesn't work properly. "{{-Wextra 
> -Wconversion}}" is passed as a single argument instead of two separate ones...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin reassigned ARROW-2391:
-

Assignee: Krisztian Szucs

> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2448) Segfault when plasma client goes out of scope before buffer.

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin reassigned ARROW-2448:
-

Assignee: Philipp Moritz

> Segfault when plasma client goes out of scope before buffer.
> 
>
> Key: ARROW-2448
> URL: https://issues.apache.org/jira/browse/ARROW-2448
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++), Python
>Reporter: Robert Nishihara
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The following causes a segfault.
>  
> First start a plasma store with
> {code:java}
> plasma_store -s /tmp/store -m 100{code}
> Then run the following in Python.
> {code}
> import pyarrow.plasma as plasma
> import numpy as np
> client = plasma.connect('/tmp/store', '', 0)
> object_id = client.put(np.zeros(3))
> buf = client.get(object_id)
> del client
> del buf  # This segfaults.{code}
> The backtrace is 
> {code:java}
> (lldb) bt
> * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
> (code=1, address=0xfffc)
>   * frame #0: 0x0001056deaee 
> libplasma.0.dylib`plasma::PlasmaClient::Release(plasma::UniqueID const&) + 142
>     frame #1: 0x0001056de9e9 
> libplasma.0.dylib`plasma::PlasmaBuffer::~PlasmaBuffer() + 41
>     frame #2: 0x0001056dec9f libplasma.0.dylib`arrow::Buffer::~Buffer() + 
> 63
>     frame #3: 0x000106206661 
> lib.cpython-36m-darwin.so`std::__1::shared_ptr::~shared_ptr() 
> [inlined] std::__1::__shared_count::__release_shared(this=0x0001019b7d20) 
> at memory:3444
>     frame #4: 0x000106206617 
> lib.cpython-36m-darwin.so`std::__1::shared_ptr::~shared_ptr() 
> [inlined] 
> std::__1::__shared_weak_count::__release_shared(this=0x0001019b7d20) at 
> memory:3486
>     frame #5: 0x000106206617 
> lib.cpython-36m-darwin.so`std::__1::shared_ptr::~shared_ptr(this=0x000100791780)
>  at memory:4412
>     frame #6: 0x000106002b35 
> lib.cpython-36m-darwin.so`std::__1::shared_ptr::~shared_ptr(this=0x000100791780)
>  at memory:4410
>     frame #7: 0x0001061052c5 lib.cpython-36m-darwin.so`void 
> __Pyx_call_destructor 
> >(x=std::__1::shared_ptr::element_type @ 0x0001019b7d38 
> strong=0 weak=1) at lib.cxx:486
>     frame #8: 0x000106104f93 
> lib.cpython-36m-darwin.so`__pyx_tp_dealloc_7pyarrow_3lib_Buffer(o=0x000100791768)
>  at lib.cxx:107704
>     frame #9: 0x0001069fcd54 
> multiarray.cpython-36m-darwin.so`array_dealloc + 292
>     frame #10: 0x0001000e8daf 
> libpython3.6m.dylib`_PyDict_DelItem_KnownHash + 463
>     frame #11: 0x000100171899 
> libpython3.6m.dylib`_PyEval_EvalFrameDefault + 13321
>     frame #12: 0x0001001791ef 
> libpython3.6m.dylib`_PyEval_EvalCodeWithName + 2447
>     frame #13: 0x00010016e3d4 libpython3.6m.dylib`PyEval_EvalCode + 100
>     frame #14: 0x0001001a3bd6 
> libpython3.6m.dylib`PyRun_InteractiveOneObject + 582
>     frame #15: 0x0001001a350e 
> libpython3.6m.dylib`PyRun_InteractiveLoopFlags + 222
>     frame #16: 0x0001001a33fc libpython3.6m.dylib`PyRun_AnyFileExFlags + 
> 60
>     frame #17: 0x0001001bc835 libpython3.6m.dylib`Py_Main + 3829
>     frame #18: 0x00010df8 python`main + 232
>     frame #19: 0x7fff6cd80015 libdyld.dylib`start + 1
>     frame #20: 0x7fff6cd80015 libdyld.dylib`start + 1{code}
> Basically, the issue is that when the buffer goes out of scope, it calls 
> {{Release}} on the plasma client, but the client has already been deallocated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2508) [Python] pytest API changes make tests fail

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin reassigned ARROW-2508:
-

Assignee: Philipp Moritz

> [Python] pytest API changes make tests fail
> ---
>
> Key: ARROW-2508
> URL: https://issues.apache.org/jira/browse/ARROW-2508
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Seems like there is a new pytest on pypy, it produces the following failures:
> ```
> === FAILURES 
> ===
> __ TestConvertDateTimeLikeTypes.test_pandas_datetime_to_date64_failures[None] 
> __
>  
> self =  at 0x112dd6a90>
> mask = None
>  
>  @pytest.mark.parametrize('mask', [
>  None,
>  np.ones(3),
>  np.array([True, False, False])
>  ])
>  def test_pandas_datetime_to_date64_failures(self, mask):
>  s = pd.to_datetime([
>  '2018-05-10T10:24:01',
>  '2018-05-11T10:24:01',
>  '2018-05-12T10:24:01',
>  ])
>  
>  expected_msg = 'Timestamp value had non-zero intraday milliseconds'
> > with pytest.raises(pa.ArrowInvalid, msg=expected_msg):
> E TypeError: Unexpected keyword arguments passed to pytest.raises: msg
>  
> pyarrow-test-3.6/lib/python3.6/site-packages/pyarrow/tests/test_convert_pandas.py:862:
>  TypeError
> _ TestConvertDateTimeLikeTypes.test_pandas_datetime_to_date64_failures[mask1] 
> __
>  
> self =  at 0x113213160>
> mask = array([ 1., 1., 1.])
>  
>  @pytest.mark.parametrize('mask', [
>  None,
>  np.ones(3),
>  np.array([True, False, False])
>  ])
>  def test_pandas_datetime_to_date64_failures(self, mask):
>  s = pd.to_datetime([
>  '2018-05-10T10:24:01',
>  '2018-05-11T10:24:01',
>  '2018-05-12T10:24:01',
>  ])
>  
>  expected_msg = 'Timestamp value had non-zero intraday milliseconds'
> > with pytest.raises(pa.ArrowInvalid, msg=expected_msg):
> E TypeError: Unexpected keyword arguments passed to pytest.raises: msg
>  
> pyarrow-test-3.6/lib/python3.6/site-packages/pyarrow/tests/test_convert_pandas.py:862:
>  TypeError
> _ TestConvertDateTimeLikeTypes.test_pandas_datetime_to_date64_failures[mask2] 
> __
>  
> self =  at 0x112ed4c88>
> mask = array([ True, False, False], dtype=bool)
>  
>  @pytest.mark.parametrize('mask', [
>  None,
>  np.ones(3),
>  np.array([True, False, False])
>  ])
>  def test_pandas_datetime_to_date64_failures(self, mask):
>  s = pd.to_datetime([
>  '2018-05-10T10:24:01',
>  '2018-05-11T10:24:01',
>  '2018-05-12T10:24:01',
>  ])
>  
>  expected_msg = 'Timestamp value had non-zero intraday milliseconds'
> > with pytest.raises(pa.ArrowInvalid, msg=expected_msg):
> E TypeError: Unexpected keyword arguments passed to pytest.raises: msg
>  
> pyarrow-test-3.6/lib/python3.6/site-packages/pyarrow/tests/test_convert_pandas.py:862:
>  TypeError
> === short test summary info 
> 
> ```
> I think we can just change msg to message and it should work again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2464) [Python] Use a python_version marker instead of a condition

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin reassigned ARROW-2464:
-

Assignee: Omer Katz

> [Python] Use a python_version marker instead of a condition
> ---
>
> Key: ARROW-2464
> URL: https://issues.apache.org/jira/browse/ARROW-2464
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Packaging, Python
>Affects Versions: 0.9.0
>Reporter: Omer Katz
>Assignee: Omer Katz
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> When installing pyarrow 0.9.0 pipenv complains that futures has no matching 
> versions.
> While that may be a bug in pipenv it does not matter. The standard way to 
> specify a conditional dependency is using a marker.
> We should use the python_version marker to tell pip if it should install 
> futures or not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2482) [Rust] support nested types

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin reassigned ARROW-2482:
-

Assignee: Andy Grove

> [Rust] support nested types
> ---
>
> Key: ARROW-2482
> URL: https://issues.apache.org/jira/browse/ARROW-2482
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Chao Sun
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The Rust Array type doesn't seem to support nested types so far. We should 
> implement it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2462) [C++] Segfault when writing a parquet table containing a dictionary column from Record Batch Stream

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin reassigned ARROW-2462:
-

Assignee: Matt Topol

> [C++] Segfault when writing a parquet table containing a dictionary column 
> from Record Batch Stream
> ---
>
> Key: ARROW-2462
> URL: https://issues.apache.org/jira/browse/ARROW-2462
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.10.0
>Reporter: Matt Topol
>Assignee: Matt Topol
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Discovered this through using pyarrow and dealing with RecordBatch Streams 
> and parquet. The issue can be replicated as follows:
> {code:python}
> import pyarrow as pa
> import pyarrow.parquet as pq
> # create record batch with 1 dictionary column
> indices = pa.array([1,0,1,1,0])
> dictionary = pa.array(['Foo', 'Bar'])
> dict_array = pa.DictionaryArray.from_arrays(indices, dictionary)
> rb = pa.RecordBatch.from_arrays( [ dict_array ], [ 'd0' ] )
> # write out using RecordBatchStreamWriter
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchStreamWriter(sink, rb.schema)
> writer.write_batch(rb)
> writer.close()
> buf = sink.get_result()
> # read in and try to write parquet table
> reader = pa.open_stream(buf)
> tbl = reader.read_all()
> pq.write_table(tbl, 'dict_table.parquet') # SEGFAULTS
> {code}
> When writing record batch streams, if there are no nulls in an array, Arrow 
> will put a placeholder nullptr instead of putting the full bitmap of 1s, when 
> deserializing that stream, the bitmap for the nulls isn't populated and is 
> left to being a nullptr. When attempting to write this table via 
> pyarrow.parquet, you end up 
> [here|https://github.com/apache/parquet-cpp/blob/master/src/parquet/arrow/writer.cc#L963]
>   in the parquet writer code which attempts to Cast the dictionary to a 
> non-dictionary representation. Since the null count isn't checked before 
> creating a BitmapReader, the BitmapReader is constructed with a nullptr for 
> the bitmap_data, but a non-zero length which then segfaults in the 
> constructor 
> [here|https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/bit-util.h#L415]
>  because bitmap is null.
> So a simple check of the null count before constructing the BitmapReader 
> avoids the segfault.
> Already filed [PR 1896|https://github.com/apache/arrow/pull/1896]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2539) [Plasma] Use unique_ptr instead of raw pointer

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin reassigned ARROW-2539:
-

Assignee: Zhijun Fu

> [Plasma] Use unique_ptr instead of raw pointer
> --
>
> Key: ARROW-2539
> URL: https://issues.apache.org/jira/browse/ARROW-2539
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Zhijun Fu
>Assignee: Zhijun Fu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> There are some places in Plasma where explicit new & delete are used, 
> forgetting to delete can cause memory leak. Use unique_ptr instead when 
> possible so that memory gets deleted automatically.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2540) [Plasma] add constructor/destructor to make sure dlfree is called automatically

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin reassigned ARROW-2540:
-

Assignee: Zhijun Fu

> [Plasma] add constructor/destructor to make sure dlfree is called 
> automatically
> ---
>
> Key: ARROW-2540
> URL: https://issues.apache.org/jira/browse/ARROW-2540
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Zhijun Fu
>Assignee: Zhijun Fu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Add constructor & destructor to ObjectTableEntry structure to make sure 
> dlfree() is called for the pointer field when the object gets destructed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2522) [C++] Version shared library files

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin reassigned ARROW-2522:
-

Assignee: Antoine Pitrou

> [C++] Version shared library files
> --
>
> Key: ARROW-2522
> URL: https://issues.apache.org/jira/browse/ARROW-2522
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> We should version installed shared library files (SO under Unix, DLL under 
> Windows) to disambiguate incompatible ABI versions.
> CMake provides support for that:
> http://pusling.com/blog/?p=352
> https://cmake.org/cmake/help/v3.11/prop_tgt/SOVERSION.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2558) [Plasma] avoid walk through all the objects when a client disconnects

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin reassigned ARROW-2558:
-

Assignee: Zhijun Fu

> [Plasma] avoid walk through all the objects when a client disconnects
> -
>
> Key: ARROW-2558
> URL: https://issues.apache.org/jira/browse/ARROW-2558
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Zhijun Fu
>Assignee: Zhijun Fu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently plasma stores list-of-clients in ObjectTableEntry, which is used to 
> track which clients are using a given object, this serves for two purposes:
> - If an object is in use.
> - If the client trying to abort an object is the one who created it.
> A problem with list-of-clients approach is that when a client disconnects, we 
> need to walk through all the objects and remove the client pointer from the 
> list for each object.
> Instead, we could add a reference count in ObjectTableEntry, and store 
> list-of-object-ids in client structure. This could both goals that the 
> original approach is targeting, while when a client disconnects, it just walk 
> through its object-ids and dereference each ObjectTableEntry, there's no need 
> to walk through all objects.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2565) [Plasma] new subscriber cannot receive notifications about existing objects

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin reassigned ARROW-2565:
-

Assignee: Zhijun Fu

> [Plasma] new subscriber cannot receive notifications about existing objects
> ---
>
> Key: ARROW-2565
> URL: https://issues.apache.org/jira/browse/ARROW-2565
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: Zhijun Fu
>Assignee: Zhijun Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When a client subscribes to plasma store, it is supposed to receive 
> notifications from plasma on existing objects in the store,  it is not able 
> to because its fd is not added to pending_notifiations_ map during when it 
> subscribes so push_notifications() is not able to see the new subscriber. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2578) [Plasma] Valgrind errors related to std::random_device

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin reassigned ARROW-2578:
-

Assignee: Philipp Moritz

> [Plasma] Valgrind errors related to std::random_device
> --
>
> Key: ARROW-2578
> URL: https://issues.apache.org/jira/browse/ARROW-2578
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> These have started popping up very recently: 
> [https://travis-ci.org/apache/arrow/jobs/378526493]
> e.g.
> {code:java}
> [ RUN ] PlasmaSerialization.SealRequest
> ==19147== Conditional jump or move depends on uninitialised value(s)
> ==19147== at 0x510FFD8: std::random_device::_M_init(std::string const&) 
> (cow-string-inst.cc:56)
> ==19147== by 0x4E3B7C: std::random_device::random_device(std::string const&) 
> (random.h:1588)
> ==19147== by 0x4E2E6F: plasma::UniqueID::from_random() (common.cc:31)
> ==19147== by 0x4871D6: 
> plasma::PlasmaSerialization_SealRequest_Test::TestBody() 
> (serialization_tests.cc:120)
> ==19147== by 0x4D6589: void 
> testing::internal::HandleSehExceptionsInMethodIfSupported void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2402)
> ==19147== by 0x4D0317: void 
> testing::internal::HandleExceptionsInMethodIfSupported void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2438)
> ==19147== by 0x4B57D8: testing::Test::Run() (gtest.cc:2475)
> ==19147== by 0x4B607F: testing::TestInfo::Run() (gtest.cc:2656)
> ==19147== by 0x4B6743: testing::TestCase::Run() (gtest.cc:2774)
> ==19147== by 0x4BD113: testing::internal::UnitTestImpl::RunAllTests() 
> (gtest.cc:4649)
> ==19147== by 0x4D7891: bool 
> testing::internal::HandleSehExceptionsInMethodIfSupported  bool>(testing::internal::UnitTestImpl*, bool 
> (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2402)
> ==19147== by 0x4D103F: bool 
> testing::internal::HandleExceptionsInMethodIfSupported  bool>(testing::internal::UnitTestImpl*, bool 
> (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2438)
> ==19147=={code}
>  Any ideas on how to fix this are appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2597) [Plasma] remove UniqueIDHasher

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin reassigned ARROW-2597:
-

Assignee: Zhijun Fu

> [Plasma] remove UniqueIDHasher
> --
>
> Key: ARROW-2597
> URL: https://issues.apache.org/jira/browse/ARROW-2597
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Zhijun Fu
>Assignee: Zhijun Fu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2629) [Plasma] Iterator invalidation for pending_notifications_

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin reassigned ARROW-2629:
-

Assignee: Philipp Moritz

> [Plasma] Iterator invalidation for pending_notifications_
> -
>
> Key: ARROW-2629
> URL: https://issues.apache.org/jira/browse/ARROW-2629
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This was discovered when running the Ray integration tests. In 
> send_notifications we are modifying pending_notifications_, which invalidates 
> the iterator in the for each loop in push_notification.
> It's not easy to reproduce, so I don't have a regression test unfortunately, 
> but I'll post a patch that fixes it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2612) [Plasma] Fix deprecated PLASMA_DEFAULT_RELEASE_DELAY

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin reassigned ARROW-2612:
-

Assignee: Philipp Moritz

> [Plasma] Fix deprecated PLASMA_DEFAULT_RELEASE_DELAY
> 
>
> Key: ARROW-2612
> URL: https://issues.apache.org/jira/browse/ARROW-2612
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The deprecated PLASMA_DEFAULT_RELEASE_DELAY is currently broken, since it 
> refers to kDeprecatedPlasmaDefaultReleaseDelay without the plasma:: namespace 
> qualifier.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2657) Segfault when importing TensorFlow after Pyarrow

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin reassigned ARROW-2657:
-

Assignee: Philipp Moritz

> Segfault when importing TensorFlow after Pyarrow
> 
>
> Key: ARROW-2657
> URL: https://issues.apache.org/jira/browse/ARROW-2657
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Reporter: Robert Nishihara
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 15h 10m
>  Remaining Estimate: 0h
>
> The following will segfault when pyarrow wheels are built using the 
> instructions in 
> [https://github.com/apache/arrow/tree/master/python/manylinux1#build-instructions].
> {code:java}
> import pyarrow
> import tensorflow
> {code}
> Searching over commits, this was introduced in 
> https://github.com/apache/arrow/commit/2093f6ec5c628ef983194a3fb3d0a621dd58c600.
> Running in gdb shows
> {code:java}
> $ gdb python
> GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
> Copyright (C) 2016 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> .
> Find the GDB manual and other documentation resources online at:
> .
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from python...done.
> (gdb) run
> Starting program: /home/ubuntu/anaconda3/bin/python 
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56) 
> [GCC 7.2.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyarrow
> >>> import tensorflow
> Program received signal SIGSEGV, Segmentation fault.
> 0x in ?? ()
> (gdb) bt
> #0 0x in ?? ()
> #1 0x77bc8a99 in __pthread_once_slow (
>  once_control=0x7fffd95561c8  namespace)::cpuid_once_flag>, init_routine=0x717e6fe1 
> )
>  at pthread_once.c:116
> #2 0x7fffd8df6faa in void std::call_once(std::once_flag&, 
> void (&)()) ()
>  from 
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so
> #3 0x7fffd8df6fde in 
> tensorflow::port::TestCPUFeature(tensorflow::port::CPUFeature) ()
>  from 
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so
> #4 0x7fffd8df6f11 in tensorflow::port::(anonymous 
> namespace)::CheckFeatureOrDie(tensorflow::port::CPUFeature, std::string 
> const&) ()
>  from 
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so
> #5 0x7fffd8c38394 in _GLOBAL__sub_I_cpu_feature_guard.cc ()
>  from 
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so
> ---Type  to continue, or q  to quit---
> #6 0x77de76ba in call_init (l=, argc=argc@entry=1, 
>  argv=argv@entry=0x7fffe598, env=env@entry=0x55c628f0)
>  at dl-init.c:72
> #7 0x77de77cb in call_init (env=0x55c628f0, argv=0x7fffe598, 
>  argc=1, l=) at dl-init.c:30
> #8 _dl_init (main_map=main_map@entry=0x565d9640, argc=1, 
>  argv=0x7fffe598, env=0x55c628f0) at dl-init.c:120
> #9 0x77dec8e2 in dl_open_worker (a=a@entry=0x7fff8810)
>  at dl-open.c:575
> #10 0x77de7564 in _dl_catch_error (
>  objname=objname@entry=0x7fff8800, 
>  errstring=errstring@entry=0x7fff8808, 
>  mallocedp=mallocedp@entry=0x7fff87ff, 
>  operate=operate@entry=0x77dec4d0 , 
>  args=args@entry=0x7fff8810) at dl-error.c:187
> #11 0x77debda9 in _dl_open (
>  file=0x7fffde1edc00 
> "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so",
>  mode=-2147483646, 
>  caller_dlopen=0x55742bfa <_PyImport_FindSharedFuncptr+138>, nsid=-2, 
>  argc=, argv=, env=0x55c628f0)
> ---Type  to continue, or q  to quit---
>  at dl-open.c:660
> #12 0x775ecf09 in dlopen_doit (a=a@entry=0x7fff8a40) at 
> dlopen.c:66
> #13 0x77de7564 in _dl_catch_error (objname=0x55b35d00, 
>  errstring=0x55b35d08, mallocedp=0x55b35cf8, 
>  operate=0x775eceb0 , args=0x7fff8a40)
>  at dl-error.c:187
> #14 0x775ed571 in _dlerror_run (
>  

[jira] [Assigned] (ARROW-2794) [Plasma] Add Delete method for multiple objects

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin reassigned ARROW-2794:
-

Assignee: Yuhong Guo

> [Plasma] Add Delete method for multiple objects
> ---
>
> Key: ARROW-2794
> URL: https://issues.apache.org/jira/browse/ARROW-2794
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Philipp Moritz
>Assignee: Yuhong Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This improves efficiency since multiple objects can be deleted with a single 
> RPC.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2920) [Python] Segfault with pytorch 0.4

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin reassigned ARROW-2920:
-

Assignee: Philipp Moritz

> [Python] Segfault with pytorch 0.4
> --
>
> Key: ARROW-2920
> URL: https://issues.apache.org/jira/browse/ARROW-2920
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> See also [https://github.com/ray-project/ray/issues/2447]
> How to reproduce:
>  * Start the Ubuntu Deep Learning AMI (version 12.0) on EC2
>  * Create a new env with {{conda create -y -n breaking-env python=3.5}}
>  * Install pytorch with {{source activate breaking-env && conda install 
> pytorch torchvision cuda91 -c pytorch}}
>  * Compile and install manylinux1 pyarrow wheels from latest arrow master as 
> described here: 
> https://github.com/apache/arrow/blob/2876a3fdd1fb9ef6918b7214d6e8d1e3017b42ad/python/manylinux1/README.md
>  * In the breaking-env just created, run the following:
>  
> {code:java}
> Python 3.5.5 |Anaconda, Inc.| (default, May 13 2018, 21:12:35)
> [GCC 7.2.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyarrow
> >>> import torch
> >>> torch.nn.Conv2d(64, 2, kernel_size=3, stride=1, padding=1, 
> >>> bias=False).cuda()
> Segmentation fault (core dumped){code}
>  
> Backtrace:
> {code:java}
> >>> torch.nn.Conv2d(64, 2, kernel_size=3, stride=1, padding=1, 
> >>> bias=False).cuda()
> Program received signal SIGSEGV, Segmentation fault.
> 0x in ?? ()
> (gdb) bt
> #0  0x in ?? ()
> #1  0x77bc8a99 in __pthread_once_slow (once_control=0x7fffdb791e50 
> , init_routine=0x7fffe46aafe1 
> )
>     at pthread_once.c:116
> #2  0x7fffda95c302 in at::Type::toBackend(at::Backend) const () from 
> /home/ubuntu/anaconda3/envs/breaking-env2/lib/python3.5/site-packages/torch/lib/libcaffe2.so
> #3  0x7fffdc59b231 in torch::autograd::VariableType::toBackend 
> (this=, b=) at 
> torch/csrc/autograd/generated/VariableType.cpp:145
> #4  0x7fffdc8dbe8a in torch::autograd::THPVariable_cuda 
> (self=0x76dbff78, args=0x76daf828, kwargs=0x0) at 
> torch/csrc/autograd/generated/python_variable_methods.cpp:333
> #5  0x5569f4e8 in PyCFunction_Call ()
> #6  0x556f67cc in PyEval_EvalFrameEx ()
> #7  0x556fbe08 in PyEval_EvalFrameEx ()
> #8  0x556f6e90 in PyEval_EvalFrameEx ()
> #9  0x556fbe08 in PyEval_EvalFrameEx ()
> #10 0x5570103d in PyEval_EvalCodeEx ()
> #11 0x55701f5c in PyEval_EvalCode ()
> #12 0x5575e454 in run_mod ()
> #13 0x5562ab5e in PyRun_InteractiveOneObject ()
> #14 0x5562ad01 in PyRun_InteractiveLoopFlags ()
> #15 0x5562ad62 in PyRun_AnyFileExFlags.cold.2784 ()
> #16 0x5562b080 in Py_Main.cold.2785 ()
> #17 0x5562b871 in main (){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1163) [Plasma][Java] Java client for Plasma

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-1163:
--
Summary: [Plasma][Java] Java client for Plasma  (was: [Plasma] Java client 
for Plasma)

> [Plasma][Java] Java client for Plasma
> -
>
> Key: ARROW-1163
> URL: https://issues.apache.org/jira/browse/ARROW-1163
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java, Plasma (C++)
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 11h
>  Remaining Estimate: 0h
>
> We should start thinking about how a Java client for plasma would look like. 
> Given the focus of arrow to support Python, C++ and Java really well, it is 
> the next important target after Python and C++.
> My preliminary thoughts on it are the following ones: We can either go with 
> JNI and wrap the C++ client or (in my opinion preferable) write a pure Java 
> client. It would communicate with the Plasma store via Java flatbuffers over 
> sockets.
> It seems that the only thing blocking a pure Java client at the moment is the 
> way we ship file descriptors for the memory mapped files between store and 
> client (see the file fling.cc in the Plasma repo). We would need to get rid 
> of that because there is no pure Java API that allows transferring file 
> descriptors over a process boundary. So the way to transfer memory mapped 
> files over process boundaries then is probably to use the file system and 
> keep the memory mapped files in the file system instead of unlinking them 
> immediately (as we do at the moment), so they can be opened by the client 
> process via their path.
> The challenge in this case is how to clean the files up and make sure they 
> are not lying around if the plasma store crashes. One option is to store the 
> plasma store PID with the file (i.e. as part of the file name) and let the 
> plasma store clean them up the next time it is started); maybe there is OS 
> level support for temporary files we can reuse.
> I probably won't get to this for a while, so if anybody needs this or has 
> free cycles, they should feel free to chime in. Also opinions on the design 
> are appreciated!
> -- Philipp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1163) [Plasma][Java] Java client for Plasma

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-1163:
--
Component/s: Java

> [Plasma][Java] Java client for Plasma
> -
>
> Key: ARROW-1163
> URL: https://issues.apache.org/jira/browse/ARROW-1163
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java, Plasma (C++)
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 11h
>  Remaining Estimate: 0h
>
> We should start thinking about how a Java client for plasma would look like. 
> Given the focus of arrow to support Python, C++ and Java really well, it is 
> the next important target after Python and C++.
> My preliminary thoughts on it are the following ones: We can either go with 
> JNI and wrap the C++ client or (in my opinion preferable) write a pure Java 
> client. It would communicate with the Plasma store via Java flatbuffers over 
> sockets.
> It seems that the only thing blocking a pure Java client at the moment is the 
> way we ship file descriptors for the memory mapped files between store and 
> client (see the file fling.cc in the Plasma repo). We would need to get rid 
> of that because there is no pure Java API that allows transferring file 
> descriptors over a process boundary. So the way to transfer memory mapped 
> files over process boundaries then is probably to use the file system and 
> keep the memory mapped files in the file system instead of unlinking them 
> immediately (as we do at the moment), so they can be opened by the client 
> process via their path.
> The challenge in this case is how to clean the files up and make sure they 
> are not lying around if the plasma store crashes. One option is to store the 
> plasma store PID with the file (i.e. as part of the file name) and let the 
> plasma store clean them up the next time it is started); maybe there is OS 
> level support for temporary files we can reuse.
> I probably won't get to this for a while, so if anybody needs this or has 
> free cycles, they should feel free to chime in. Also opinions on the design 
> are appreciated!
> -- Philipp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1780) [Java] JDBC Adapter for Apache Arrow

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-1780:
--
Component/s: Python

> [Java] JDBC Adapter for Apache Arrow
> 
>
> Key: ARROW-1780
> URL: https://issues.apache.org/jira/browse/ARROW-1780
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Atul Dambalkar
>Assignee: Atul Dambalkar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> At a high level the JDBC Adapter will allow upstream apps to query RDBMS data 
> over JDBC and get the JDBC objects converted to Arrow objects/structures. The 
> upstream utility can then work with Arrow objects/structures with usual 
> performance benefits. The utility will be very much similar to C++ 
> implementation of "Convert a vector of row-wise data into an Arrow table" as 
> described here - 
> https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html
> The utility will read data from RDBMS and covert the data into Arrow 
> objects/structures. So from that perspective this will Read data from RDBMS, 
> If the utility can push Arrow objects to RDBMS is something need to be 
> discussed and will be out of scope for this utility for now. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1163) [Plasma] Java client for Plasma

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-1163:
--
Component/s: Plasma (C++)

> [Plasma] Java client for Plasma
> ---
>
> Key: ARROW-1163
> URL: https://issues.apache.org/jira/browse/ARROW-1163
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java, Plasma (C++)
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 11h
>  Remaining Estimate: 0h
>
> We should start thinking about how a Java client for plasma would look like. 
> Given the focus of arrow to support Python, C++ and Java really well, it is 
> the next important target after Python and C++.
> My preliminary thoughts on it are the following ones: We can either go with 
> JNI and wrap the C++ client or (in my opinion preferable) write a pure Java 
> client. It would communicate with the Plasma store via Java flatbuffers over 
> sockets.
> It seems that the only thing blocking a pure Java client at the moment is the 
> way we ship file descriptors for the memory mapped files between store and 
> client (see the file fling.cc in the Plasma repo). We would need to get rid 
> of that because there is no pure Java API that allows transferring file 
> descriptors over a process boundary. So the way to transfer memory mapped 
> files over process boundaries then is probably to use the file system and 
> keep the memory mapped files in the file system instead of unlinking them 
> immediately (as we do at the moment), so they can be opened by the client 
> process via their path.
> The challenge in this case is how to clean the files up and make sure they 
> are not lying around if the plasma store crashes. One option is to store the 
> plasma store PID with the file (i.e. as part of the file name) and let the 
> plasma store clean them up the next time it is started); maybe there is OS 
> level support for temporary files we can reuse.
> I probably won't get to this for a while, so if anybody needs this or has 
> free cycles, they should feel free to chime in. Also opinions on the design 
> are appreciated!
> -- Philipp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1780) [Java] JDBC Adapter for Apache Arrow

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-1780:
--
Summary: [Java] JDBC Adapter for Apache Arrow  (was: JDBC Adapter for 
Apache Arrow)

> [Java] JDBC Adapter for Apache Arrow
> 
>
> Key: ARROW-1780
> URL: https://issues.apache.org/jira/browse/ARROW-1780
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Atul Dambalkar
>Assignee: Atul Dambalkar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> At a high level the JDBC Adapter will allow upstream apps to query RDBMS data 
> over JDBC and get the JDBC objects converted to Arrow objects/structures. The 
> upstream utility can then work with Arrow objects/structures with usual 
> performance benefits. The utility will be very much similar to C++ 
> implementation of "Convert a vector of row-wise data into an Arrow table" as 
> described here - 
> https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html
> The utility will read data from RDBMS and covert the data into Arrow 
> objects/structures. So from that perspective this will Read data from RDBMS, 
> If the utility can push Arrow objects to RDBMS is something need to be 
> discussed and will be out of scope for this utility for now. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2322) Document requirements to run dev/release/01-perform.sh

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2322:
--
Component/s: Packaging

> Document requirements to run dev/release/01-perform.sh
> --
>
> Key: ARROW-2322
> URL: https://issues.apache.org/jira/browse/ARROW-2322
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> I am unable to run this script on Ubuntu 16.04
> {code}
> [INFO] [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-deploy-plugin:2.8.2:deploy (default-deploy) on 
> project arrow-java-root: Failed to deploy artifacts: Could not transfer 
> artifact org.apache.arrow:arrow-java-root:pom:0.9.0 from/to 
> apache.releases.https 
> (https://repository.apache.org/service/local/staging/deploy/maven2): Failed 
> to transfer file: 
> https://repository.apache.org/service/local/staging/deploy/maven2/org/apache/arrow/arrow-java-root/0.9.0/arrow-java-root-0.9.0.pom.
>  Return code is: 401, ReasonPhrase: Unauthorized. -> [Help 1]
> {code}
> I'm sure there's an easy fix for this, but the requirements aren't documented 
> in dev/release/README, so other PMC members are likely to also have problems



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2207) [GLib] Support decimal type

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2207:
--
Component/s: GLib

> [GLib] Support decimal type
> ---
>
> Key: ARROW-2207
> URL: https://issues.apache.org/jira/browse/ARROW-2207
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: GLib
>Reporter: yosuke shiro
>Assignee: yosuke shiro
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2247) [Python] Statically-linking boost_regex in both libarrow and libparquet results in segfault

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2247:
--
Component/s: Python

> [Python] Statically-linking boost_regex in both libarrow and libparquet 
> results in segfault
> ---
>
> Key: ARROW-2247
> URL: https://issues.apache.org/jira/browse/ARROW-2247
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: 0.10.0
>
>
> This is a backtrace loading {{libparquet.so}} on Ubuntu 14.04 using boost 
> 1.66.1 from conda-forge. Both libarrow and libparquet contain {{boost_regex}} 
> statically linked. 
> {code}
> In [1]: import ctypes
> In [2]: ctypes.CDLL('libparquet.so')
> Program received signal SIGSEGV, Segmentation fault.
> 0x7fffed4ad3fb in std::basic_string, 
> std::allocator >::basic_string(std::string const&) () from 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> (gdb) bt
> #0  0x7fffed4ad3fb in std::basic_string, 
> std::allocator >::basic_string(std::string const&) () from 
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #1  0x7fffed74c1fc in 
> boost::re_detail_106600::cpp_regex_traits_char_layer::init() ()
>from /home/wesm/cpp-toolchain/lib/libboost_regex.so.1.66.0
> #2  0x7fffed794803 in 
> boost::object_cache, 
> boost::re_detail_106600::cpp_regex_traits_implementation 
> >::do_get(boost::re_detail_106600::cpp_regex_traits_base const&, 
> unsigned long) () from /home/wesm/cpp-toolchain/lib/libboost_regex.so.1.66.0
> #3  0x7fffed79e62b in boost::basic_regex boost::cpp_regex_traits > >::do_assign(char const*, char const*, 
> unsigned int) () from /home/wesm/cpp-toolchain/lib/libboost_regex.so.1.66.0
> #4  0x7fffee58561b in boost::basic_regex boost::cpp_regex_traits > >::assign (this=0x7fff3780, 
> p1=0x7fffee600602 
> "(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)",
>  
> p2=0x7fffee60064a "", f=0) at 
> /home/wesm/cpp-toolchain/include/boost/regex/v4/basic_regex.hpp:381
> #5  0x7fffee5855a7 in boost::basic_regex boost::cpp_regex_traits > >::assign (this=0x7fff3780, 
> p=0x7fffee600602 
> "(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)",
>  f=0)
> at /home/wesm/cpp-toolchain/include/boost/regex/v4/basic_regex.hpp:366
> #6  0x7fffee5683f3 in boost::basic_regex boost::cpp_regex_traits > >::basic_regex (this=0x7fff3780, 
> p=0x7fffee600602 
> "(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)",
>  f=0)
> at /home/wesm/cpp-toolchain/include/boost/regex/v4/basic_regex.hpp:335
> #7  0x7fffee5656d0 in parquet::ApplicationVersion::ApplicationVersion (
> Python Exception  There is no member named _M_dataplus.: 
> this=0x7fffee8f1fb8 
> , created_by=)
> at ../src/parquet/metadata.cc:452
> #8  0x7fffee41c271 in __cxx_global_var_init.1(void) () at 
> ../src/parquet/metadata.cc:35
> #9  0x7fffee41c44e in _GLOBAL__sub_I_metadata.tmp.wesm_desktop.4838.ii ()
>from /home/wesm/local/lib/libparquet.so
> #10 0x77dea1da in call_init (l=, argc=argc@entry=2, 
> argv=argv@entry=0x7fff5d88, 
> env=env@entry=0x7fff5da0) at dl-init.c:78
> #11 0x77dea2c3 in call_init (env=, argv= out>, argc=, 
> l=) at dl-init.c:36
> #12 _dl_init (main_map=main_map@entry=0x13fb220, argc=2, argv=0x7fff5d88, 
> env=0x7fff5da0)
> at dl-init.c:126
> {code}
> This seems to be caused by static initializations in libparquet:
> https://github.com/apache/parquet-cpp/blob/master/src/parquet/metadata.cc#L34
> We should see if removing these static initializations makes the problem go 
> away. If not, then statically-linking boost_regex in both libraries is not 
> advisable.
> For this reason and more, I really wish that Arrow and Parquet shared a 
> common build system and monorepo structure -- it would make handling these 
> toolchain and build-related issues much simpler. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2350) Shrink size of spark_integration Docker container

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2350:
--
Component/s: Continuous Integration

> Shrink size of spark_integration Docker container
> -
>
> Key: ARROW-2350
> URL: https://issues.apache.org/jira/browse/ARROW-2350
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: James Lamb
>Assignee: James Lamb
>Priority: Minor
>  Labels: docker, pull-request-available, spark
> Fix For: 0.10.0
>
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> I would like to propose a few changes to the spark_integration Dockerfile:
> [https://github.com/apache/arrow/tree/master/dev/spark_integration]
> The size of the resulting image can be reduced by making the following 
> changes:
>  * consolidating all RUN commands into a single RUN layer (reducing the 
> number of layers)
>  * running {color:#14892c}apt-get clean{color} to clear out the package cache
>  * running {color:#14892c}conda clean --all{color} to clear out cached 
> package tarballs, abandoned package versions, and other build artifacts from 
> all the libraries that are conda installed
> I will be submitting a PR on GitHub shortly. Generating this issue first so I 
> can tag my PR to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2355) [Python] Unable to import pyarrow [0.9.0] OSX

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2355:
--
Component/s: Packaging

> [Python] Unable to import pyarrow [0.9.0] OSX
> -
>
> Key: ARROW-2355
> URL: https://issues.apache.org/jira/browse/ARROW-2355
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Bradford W Littooy
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.10.0
>
>
> I have pip installed pyarrow to my mac os x (version 10.13.3). When I try to 
> import pyarrow into a python3.6 interpreter, I get the following import error:
>  
> >>> import pyarrow
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/__init__.py",
>  line 47, in 
>     from pyarrow.lib import cpu_count, set_cpu_count
> ImportError: 
> dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/lib.cpython-36m-darwin.so,
>  2): Library not loaded: libarrow_boost_system.dylib
>   Referenced from: 
> /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/libarrow.0.dylib
>   Reason: image not found
> >>>
> I've installed pyarrow (0.9) on an EC2 instance with no issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2334) [C++] Update boost to 1.66.0

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2334:
--
Component/s: C++

> [C++] Update boost to 1.66.0
> 
>
> Key: ARROW-2334
> URL: https://issues.apache.org/jira/browse/ARROW-2334
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2395) [Python] Correct flake8 errors outside of pyarrow/ directory

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2395:
--
Component/s: Python

> [Python] Correct flake8 errors outside of pyarrow/ directory
> 
>
> Key: ARROW-2395
> URL: https://issues.apache.org/jira/browse/ARROW-2395
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Alex Hagerman
>Assignee: Wes McKinney
>Priority: Minor
>  Labels: beginner, pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Fix flake8 warnings for files outside of benchmarks directory.
>  
> !https://user-images.githubusercontent.com/2118138/38217076-f08a67da-369a-11e8-8166-b3a9ed7d9a60.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2387) [Python] negative decimal values get spurious rescaling error

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2387:
--
Summary: [Python] negative decimal values get spurious rescaling error  
(was: negative decimal values get spurious rescaling error)

> [Python] negative decimal values get spurious rescaling error
> -
>
> Key: ARROW-2387
> URL: https://issues.apache.org/jira/browse/ARROW-2387
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: ben w
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> {code:java}
> $ python
> Python 2.7.12 (default, Nov 20 2017, 18:23:56)
> [GCC 5.4.0 20160609] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyarrow as pa, decimal
> >>> one = decimal.Decimal('1.00')
> >>> neg_one = decimal.Decimal('-1.00')
> >>> pa.array([one], pa.decimal128(24, 12))
> 
> [
> Decimal('1.')
> ]
> >>> pa.array([neg_one], pa.decimal128(24, 12))
> Traceback (most recent call last):
> File "", line 1, in 
> File "array.pxi", line 181, in pyarrow.lib.array
> File "array.pxi", line 36, in pyarrow.lib._sequence_to_array
> File "error.pxi", line 77, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Rescaling decimal value -100.00 from 
> original scale of 6 to new scale of 12 would cause data loss
> >>> pa.__version__
> '0.9.0'
> {code}
> not only is the error spurious, the decimal value has been multiplied by one 
> million (i.e. 10 ** 6 and 6 is the difference in scales, but this is still 
> pretty strange to me).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2414) A variety of typos can be found

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2414:
--
Component/s: Documentation

> A variety of typos can be found
> ---
>
> Key: ARROW-2414
> URL: https://issues.apache.org/jira/browse/ARROW-2414
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Bruce Mitchener
>Assignee: Bruce Mitchener
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> This is just so that I can submit a PR for a bunch of typo fixes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2400) [C++] Status destructor is expensive

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2400:
--
Component/s: C++

> [C++] Status destructor is expensive
> 
>
> Key: ARROW-2400
> URL: https://issues.apache.org/jira/browse/ARROW-2400
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Let's take the following micro-benchmark (in Python):
> {code:bash}
> $ python -m timeit -s "import pyarrow as pa; data = [b'xx' for i in 
> range(1)]" "pa.array(data, type=pa.binary())"
> 1000 loops, best of 3: 784 usec per loop
> {code}
> If I replace the Status destructor with a no-op:
> {code:c++}
>   ~Status() { }
> {code}
> then the benchmark result becomes:
> {code:bash}
> $ python -m timeit -s "import pyarrow as pa; data = [b'xx' for i in 
> range(1)]" "pa.array(data, type=pa.binary())"
> 1000 loops, best of 3: 561 usec per loop
> {code}
> This is almost a 30% win. I get similar results on the conversion benchmarks 
> in the benchmark suite.
> I'm unsure about the explanation. In the common case, {{delete _state}} 
> should be extremely fast, since the state is NULL. Yet, it seems it adds 
> significant overhead. Perhaps because of exception handling?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2422) [Python] Support more filter operators on Hive partitioned Parquet files

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2422:
--
Summary: [Python] Support more filter operators on Hive partitioned Parquet 
files  (was: Support more filter operators on Hive partitioned Parquet files)

> [Python] Support more filter operators on Hive partitioned Parquet files
> 
>
> Key: ARROW-2422
> URL: https://issues.apache.org/jira/browse/ARROW-2422
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Julius Neuffer
>Assignee: Julius Neuffer
>Priority: Minor
>  Labels: features, pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> After implementing basic filters ('=', '!=') on Hive partitioned Parquet 
> files (ARROW-2401), I'll extend them ('>', '<', '<=', '>=') with a new PR on 
> Github.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2441) [Rust] Builder::slice_mut assertions are too strict

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2441:
--
Component/s: Rust

> [Rust] Builder::slice_mut assertions are too strict
> --
>
> Key: ARROW-2441
> URL: https://issues.apache.org/jira/browse/ARROW-2441
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> The assertions only allow slice up to builder length, rather than up to 
> builder capacity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2422) [Python] Support more filter operators on Hive partitioned Parquet files

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2422:
--
Component/s: Python

> [Python] Support more filter operators on Hive partitioned Parquet files
> 
>
> Key: ARROW-2422
> URL: https://issues.apache.org/jira/browse/ARROW-2422
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Julius Neuffer
>Assignee: Julius Neuffer
>Priority: Minor
>  Labels: features, pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> After implementing basic filters ('=', '!=') on Hive partitioned Parquet 
> files (ARROW-2401), I'll extend them ('>', '<', '<=', '>=') with a new PR on 
> Github.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2421) [C++] Update LLVM version in cpp README

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2421:
--
Component/s: C++

> [C++] Update LLVM version in cpp README
> ---
>
> Key: ARROW-2421
> URL: https://issues.apache.org/jira/browse/ARROW-2421
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Alessandro Andrioni
>Priority: Trivial
> Fix For: 0.10.0
>
>
> The readme references llvm 4 multiple times however llvm 5 is required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2478) [C++] Introduce a checked_cast function that performs a dynamic_cast in debug mode

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2478:
--
Component/s: C++

> [C++] Introduce a checked_cast function that performs a dynamic_cast in debug 
> mode
> --
>
> Key: ARROW-2478
> URL: https://issues.apache.org/jira/browse/ARROW-2478
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> This would use {{static_cast}} in release mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2503) [Python] Trailing space character in RowGroup statistics of pyarrow.parquet.ParquetFile

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2503:
--
Component/s: Python

> [Python] Trailing space character in RowGroup statistics of 
> pyarrow.parquet.ParquetFile
> ---
>
> Key: ARROW-2503
> URL: https://issues.apache.org/jira/browse/ARROW-2503
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Julius Neuffer
>Assignee: Julius Neuffer
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When reading a parquet file containing a string column, the _RowGroup_ 
> statistics contain a trailing space character for the string column. The 
> example below shows the behavior.
> {code}
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> # create and write arrow table as parquet
> df = pd.DataFrame({'string_column': ['some', 'string', 'values', 'here']})
> table = pa.Table.from_pandas(df)
> pq.write_table(table, 'example.parquet')
> # read parquet file metadata and print string column statistics
> pq_file = pq.ParquetFile(open('example.parquet', 'rb'))
> print(pq_file.metadata.row_group(0).column(0).statistics.max) # yields 
> b'values '
> print(pq_file.metadata.row_group(0).column(0).statistics.min) # yields b'here 
> '
> {code}
> For other data types I did not observe this problem, even though the 
> statistics are always strings.
> When reading the same file with _fastparquet_, there is no trailing space 
> character, which implies that this problem occurs in the reading path of 
> _pyarrow.parquet_. I am aware that this might well be an issue with 
> _parquet-cpp_, but as I face this bug as a _pyarrow_ user, I report it here.
> I'll try to investigate this further and report back here.
>  
> *Update:*
> The trailing space is added in _parquet-cpp_. _pyarrow_ calls the function 
> _FormatStatValue_ which adds the trailing space 
> (https://github.com/apache/parquet-cpp/blob/master/src/parquet/types.cc#L52). 
> There is no comment there to explain it. Does anyone here know what the 
> reason is?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2443) [Python] Conversion from pandas of empty categorical fails with ArrowInvalid

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2443:
--
Component/s: Python

> [Python] Conversion from pandas of empty categorical fails with ArrowInvalid
> 
>
> Key: ARROW-2443
> URL: https://issues.apache.org/jira/browse/ARROW-2443
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Florian Jetter
>Assignee: Uwe L. Korn
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> The conversion of an empty pandas categorical raises an exception. Before 
> version `0.9.0` this was possible
> {code:java}
> import pandas as pd
> import pyarrow as pa
> pa.Table.from_pandas(pd.DataFrame({'cat': pd.Categorical([])})){code}
> raises:
> {{ArrowInvalid: Dictionary indices must have non-zero length}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2452) [TEST] Spark integration test fails with permission error

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2452:
--
Component/s: Continuous Integration

> [TEST] Spark integration test fails with permission error
> -
>
> Key: ARROW-2452
> URL: https://issues.apache.org/jira/browse/ARROW-2452
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {{arrow/dev/run_docker_compose.sh spark_integration}}
> {code}
> Scanning dependencies of target lib
> [ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o
> [100%] Linking CXX shared module release/lib.so
> [100%] Built target lib
> -- Finished cmake --build for pyarrow
> Bundling includes: release/include
> ('Moving built C-extension', 'release/lib.so', 'to build path', 
> '/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so')
> release/_parquet.so
> Cython module _parquet failure permitted
> release/_orc.so
> Cython module _orc failure permitted
> release/plasma.so
> Cython module plasma failure permitted
> running install
> error: can't create or remove files in install directory
> The following error occurred while trying to add or remove files in the
> installation directory:
> [Errno 13] Permission denied: 
> '/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test'
> The installation directory you specified (via --install-dir, --prefix, or
> the distutils default setting) was:
> /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/
> Perhaps your account does not have write access to this directory?  If the
> installation directory is a system-owned directory, you may need to sign in
> as the administrator or "root" account.  If you do not have administrative
> access to this machine, you may wish to choose a different installation
> directory, preferably one that is listed in your PYTHONPATH environment
> variable.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2503) [Python] Trailing space character in RowGroup statistics of pyarrow.parquet.ParquetFile

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2503:
--
Summary: [Python] Trailing space character in RowGroup statistics of 
pyarrow.parquet.ParquetFile  (was: Trailing space character in RowGroup 
statistics of pyarrow.parquet.ParquetFile)

> [Python] Trailing space character in RowGroup statistics of 
> pyarrow.parquet.ParquetFile
> ---
>
> Key: ARROW-2503
> URL: https://issues.apache.org/jira/browse/ARROW-2503
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Julius Neuffer
>Assignee: Julius Neuffer
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When reading a parquet file containing a string column, the _RowGroup_ 
> statistics contain a trailing space character for the string column. The 
> example below shows the behavior.
> {code}
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> # create and write arrow table as parquet
> df = pd.DataFrame({'string_column': ['some', 'string', 'values', 'here']})
> table = pa.Table.from_pandas(df)
> pq.write_table(table, 'example.parquet')
> # read parquet file metadata and print string column statistics
> pq_file = pq.ParquetFile(open('example.parquet', 'rb'))
> print(pq_file.metadata.row_group(0).column(0).statistics.max) # yields 
> b'values '
> print(pq_file.metadata.row_group(0).column(0).statistics.min) # yields b'here 
> '
> {code}
> For other data types I did not observe this problem, even though the 
> statistics are always strings.
> When reading the same file with _fastparquet_, there is no trailing space 
> character, which implies that this problem occurs in the reading path of 
> _pyarrow.parquet_. I am aware that this might well be an issue with 
> _parquet-cpp_, but as I face this bug as a _pyarrow_ user, I report it here.
> I'll try to investigate this further and report back here.
>  
> *Update:*
> The trailing space is added in _parquet-cpp_. _pyarrow_ calls the function 
> _FormatStatValue_ which adds the trailing space 
> (https://github.com/apache/parquet-cpp/blob/master/src/parquet/types.cc#L52). 
> There is no comment there to explain it. Does anyone here know what the 
> reason is?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2565) [Plasma] new subscriber cannot receive notifications about existing objects

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2565:
--
Component/s: Plasma (C++)

> [Plasma] new subscriber cannot receive notifications about existing objects
> ---
>
> Key: ARROW-2565
> URL: https://issues.apache.org/jira/browse/ARROW-2565
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: Zhijun Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When a client subscribes to plasma store, it is supposed to receive 
> notifications from plasma on existing objects in the store,  it is not able 
> to because its fd is not added to pending_notifiations_ map during when it 
> subscribes so push_notifications() is not able to see the new subscriber. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2541) [Plasma] Clean up macro usage

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2541:
--
Component/s: Plasma (C++)

> [Plasma] Clean up macro usage
> -
>
> Key: ARROW-2541
> URL: https://issues.apache.org/jira/browse/ARROW-2541
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There are still a lot of macros being used as constants in the plasma 
> codebase. This should be cleaned up and replaced with constexpr (deprecating 
> them where appropriate).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2576) [GLib] Add abs functions for Decimal128.

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2576:
--
Component/s: GLib

> [GLib] Add abs functions for Decimal128.
> 
>
> Key: ARROW-2576
> URL: https://issues.apache.org/jira/browse/ARROW-2576
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: GLib
>Reporter: yosuke shiro
>Assignee: yosuke shiro
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2516) AppVeyor Build Matrix should be specific to the changes made in a PR

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2516:
--
Component/s: Packaging

> AppVeyor Build Matrix should be specific to the changes made in a PR
> 
>
> Key: ARROW-2516
> URL: https://issues.apache.org/jira/browse/ARROW-2516
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Paddy Horan
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2570) [Python] Add support for writing parquet files with LZ4 compression

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2570:
--
Component/s: Python

> [Python] Add support for writing parquet files with LZ4 compression
> ---
>
> Key: ARROW-2570
> URL: https://issues.apache.org/jira/browse/ARROW-2570
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Dmitry Kalinkin
>Assignee: Dmitry Kalinkin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {code:java}
> import pyarrow as pa
> import pyarrow.parquet as pq
> data = [pa.array([None])]
> batch = pa.RecordBatch.from_arrays(data, ['x'])
> table = pa.Table.from_batches([batch])
> pq.write_table(table, "test.parquet", compression='LZ4'){code}
> currently fails with
> {code:java}
> Traceback (most recent call last):
>  File "_parquet.pyx", line 811, in pyarrow._parquet.check_compression_name
> pyarrow.lib.ArrowException: Unsupported compression: LZ4{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2537) [Ruby] Import

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2537:
--
Component/s: Ruby

> [Ruby] Import
> -
>
> Key: ARROW-2537
> URL: https://issues.apache.org/jira/browse/ARROW-2537
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Ruby
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> I'm developing Ruby bindings of Apache Arrow at 
> https://github.com/red-data-tools/red-arrow and 
> https://github.com/red-data-tools/red-arrow-gpu .
> They should be imported to the Apache Arrow project.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2571) [C++] Lz4Codec doesn't properly handle empty data

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2571:
--
Component/s: C++

> [C++] Lz4Codec doesn't properly handle empty data
> -
>
> Key: ARROW-2571
> URL: https://issues.apache.org/jira/browse/ARROW-2571
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Dmitry Kalinkin
>Assignee: Dmitry Kalinkin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> For example a following closure test will fail:
> {code:python}
> import pyarrow as pa
> import pyarrow.parquet as pq
> data = [pa.array([None] * 10)]
> batch = pa.RecordBatch.from_arrays(data, ['x'])
> table = pa.Table.from_batches([batch])
> pq.write_table(table, "test.parquet", compression='LZ4')
> table = pq.read_table("test.parquet")
> {code}
> with a following error
> {code:java}
> Traceback (most recent call last): File "test.py", line 8, in  table 
> = pq.read_table("test.parquet") File 
> "python3.6/site-packages/pyarrow/parquet.py", line 987, in read_table 
> use_pandas_metadata=use_pandas_metadata) File 
> "python3.6/site-packages/pyarrow/parquet.py", line 149, in read 
> nthreads=nthreads) File "_parquet.pyx", line 736, in 
> pyarrow._parquet.ParquetReader.read_all File "error.pxi", line 83, in 
> pyarrow.lib.check_status pyarrow.lib.ArrowIOError: Arrow error: IOError: 
> Corrupt Lz4 compressed data.
> {code}
> Writing file from with LZ4 from python requires patch for ARROW-2570. But the 
> issue can be reproduced by creating an input file with parquet-cpp. The file 
> must be compressed with LZ4 and contain a column with only gap values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2506) [Plasma] Build error on macOS

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2506:
--
Component/s: Plasma (C++)

> [Plasma] Build error on macOS
> -
>
> Key: ARROW-2506
> URL: https://issues.apache.org/jira/browse/ARROW-2506
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
> Fix For: 0.10.0
>
>
> Since the upgrade to flatbuffers 1.9.0, I'm seeing this error on the Ray CI:
> arrow/cpp/src/plasma/format/plasma.fbs:234:0: error: default value of 0 for 
> field status is not part of enum ObjectStatus
> I'm planning to just remove the '= 1' from 'Local = 1'. This will break the 
> protocol however, so if we prefer to just put in a 'Dummy = 0' object at the 
> beginning of the enum, that would also be fine with me. However, the 
> ObjectStatus API is not stable yet and not even exposed to Python, so I think 
> breaking it is fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2577) [Plasma] Add ASV benchmarks

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2577:
--
Component/s: GLib

> [Plasma] Add ASV benchmarks
> ---
>
> Key: ARROW-2577
> URL: https://issues.apache.org/jira/browse/ARROW-2577
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> We are about to merge some PRs that potentially impact plasma performance, so 
> we should set up ASV to track the changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2577) [Plasma] Add ASV benchmarks

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2577:
--
Component/s: (was: GLib)
 Plasma (C++)

> [Plasma] Add ASV benchmarks
> ---
>
> Key: ARROW-2577
> URL: https://issues.apache.org/jira/browse/ARROW-2577
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> We are about to merge some PRs that potentially impact plasma performance, so 
> we should set up ASV to track the changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2612) [Plasma] Fix deprecated PLASMA_DEFAULT_RELEASE_DELAY

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2612:
--
Component/s: Plasma (C++)

> [Plasma] Fix deprecated PLASMA_DEFAULT_RELEASE_DELAY
> 
>
> Key: ARROW-2612
> URL: https://issues.apache.org/jira/browse/ARROW-2612
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The deprecated PLASMA_DEFAULT_RELEASE_DELAY is currently broken, since it 
> refers to kDeprecatedPlasmaDefaultReleaseDelay without the plasma:: namespace 
> qualifier.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2580) [GLib] Fix abs functions for Decimal128

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2580:
--
Component/s: GLib

> [GLib] Fix abs functions for Decimal128
> ---
>
> Key: ARROW-2580
> URL: https://issues.apache.org/jira/browse/ARROW-2580
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Reporter: yosuke shiro
>Assignee: yosuke shiro
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2582) [GLib] Add negate functions for Decimal128

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2582:
--
Component/s: GLib

> [GLib] Add negate functions for Decimal128
> --
>
> Key: ARROW-2582
> URL: https://issues.apache.org/jira/browse/ARROW-2582
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Reporter: yosuke shiro
>Assignee: yosuke shiro
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2695) [Python] Prevent calling scalar contructors directly

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2695:
--
Component/s: Python

> [Python] Prevent calling scalar contructors directly
> 
>
> Key: ARROW-2695
> URL: https://issues.apache.org/jira/browse/ARROW-2695
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2715) Address apt flakiness with launchpad.net

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2715:
--
Component/s: Packaging

> Address apt flakiness with launchpad.net
> 
>
> Key: ARROW-2715
> URL: https://issues.apache.org/jira/browse/ARROW-2715
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: 0.10.0
>
>
> We've had some failing builds with errors of this variety:
> https://travis-ci.org/apache/arrow/jobs/392830710#L689
> I'm not sure the nature of the flakiness, but it would be good if we can make 
> more robust



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2585) Add Decimal128::FromBigEndian

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2585:
--
Component/s: C++

> Add Decimal128::FromBigEndian
> -
>
> Key: ARROW-2585
> URL: https://issues.apache.org/jira/browse/ARROW-2585
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Joshua Storck
>Assignee: Joshua Storck
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This code is being moved from 
> https://github.com/apache/parquet-cpp/blob/8046481235e558344c3aa059c83ee86b9f67/src/parquet/arrow/reader.cc#L1049
>  for us in this PR: https://github.com/apache/parquet-cpp/pull/462



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2630) [Java] Typo in the document

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2630:
--
Component/s: Java

> [Java] Typo in the document
> ---
>
> Key: ARROW-2630
> URL: https://issues.apache.org/jira/browse/ARROW-2630
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Bo Meng
>Assignee: Bo Meng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I am trying to fix a few typos in the Java codes while I am reading the codes 
> / javadocs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2578) [Plasma] Valgrind errors related to std::random_device

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2578:
--
Component/s: Plasma (C++)

> [Plasma] Valgrind errors related to std::random_device
> --
>
> Key: ARROW-2578
> URL: https://issues.apache.org/jira/browse/ARROW-2578
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> These have started popping up very recently: 
> [https://travis-ci.org/apache/arrow/jobs/378526493]
> e.g.
> {code:java}
> [ RUN ] PlasmaSerialization.SealRequest
> ==19147== Conditional jump or move depends on uninitialised value(s)
> ==19147== at 0x510FFD8: std::random_device::_M_init(std::string const&) 
> (cow-string-inst.cc:56)
> ==19147== by 0x4E3B7C: std::random_device::random_device(std::string const&) 
> (random.h:1588)
> ==19147== by 0x4E2E6F: plasma::UniqueID::from_random() (common.cc:31)
> ==19147== by 0x4871D6: 
> plasma::PlasmaSerialization_SealRequest_Test::TestBody() 
> (serialization_tests.cc:120)
> ==19147== by 0x4D6589: void 
> testing::internal::HandleSehExceptionsInMethodIfSupported void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2402)
> ==19147== by 0x4D0317: void 
> testing::internal::HandleExceptionsInMethodIfSupported void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2438)
> ==19147== by 0x4B57D8: testing::Test::Run() (gtest.cc:2475)
> ==19147== by 0x4B607F: testing::TestInfo::Run() (gtest.cc:2656)
> ==19147== by 0x4B6743: testing::TestCase::Run() (gtest.cc:2774)
> ==19147== by 0x4BD113: testing::internal::UnitTestImpl::RunAllTests() 
> (gtest.cc:4649)
> ==19147== by 0x4D7891: bool 
> testing::internal::HandleSehExceptionsInMethodIfSupported  bool>(testing::internal::UnitTestImpl*, bool 
> (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2402)
> ==19147== by 0x4D103F: bool 
> testing::internal::HandleExceptionsInMethodIfSupported  bool>(testing::internal::UnitTestImpl*, bool 
> (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2438)
> ==19147=={code}
>  Any ideas on how to fix this are appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2615) [Rust] Refactor introduced a bug around Arrays of String

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2615:
--
Component/s: Rust

> [Rust] Refactor introduced a bug around Arrays of String
> 
>
> Key: ARROW-2615
> URL: https://issues.apache.org/jira/browse/ARROW-2615
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The refactor unintentionally implemented the ArrowPrimitiveType trait for 
> strings. This was not intended. This mistake leaked into one example and the 
> record batch struct.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2694) [Python] ArrayValue string conversion returns the representation instead of the converted python object string

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2694:
--
Component/s: Python

> [Python] ArrayValue string conversion returns the representation instead of 
> the converted python object string
> --
>
> Key: ARROW-2694
> URL: https://issues.apache.org/jira/browse/ARROW-2694
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Florian Jetter
>Assignee: Florian Jetter
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Example:
> {code}
> # python 3.6.5
> In [1]: import pyarrow as pa
> In [2]: str(pa.array(['a'])[0])  # note the single quotes
> Out[2]: "'a'"
> In [3]: str(pa.array([1], pa.timestamp('s'))[0])
> Out[3]: "Timestamp('1970-01-01 00:00:01')"
> {code}
> instead of
> {code}
> # python 3.6.5
> In [1]: import pyarrow as pa
> In [2]: str(pa.array(['a'])[0])
> Out[2]: "a"
> In [3]: str(pa.array([1], pa.timestamp('s'))[0])
> Out[3]: "1970-01-01 00:00:01"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2875) [Packaging] Don't attempt to download arrow archive in linux builds

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2875:
--
Component/s: Packaging

> [Packaging] Don't attempt to download arrow archive in linux builds
> ---
>
> Key: ARROW-2875
> URL: https://issues.apache.org/jira/browse/ARROW-2875
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> With the version increment the rakefile expects there is an already uploaded 
> archive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2718) [Packaging] GPG sign downloaded artifacts

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2718:
--
Component/s: Packaging

> [Packaging] GPG sign downloaded artifacts
> -
>
> Key: ARROW-2718
> URL: https://issues.apache.org/jira/browse/ARROW-2718
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2724) [Packaging] Determine whether all the expected artifacts are uploaded

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2724:
--
Component/s: Packaging

> [Packaging] Determine whether all the expected artifacts are uploaded
> -
>
> Key: ARROW-2724
> URL: https://issues.apache.org/jira/browse/ARROW-2724
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> I can imagine the definition of the expected artifacts as follows:
> {code}
> conda-win:
> platform: win
> template: conda-recipes/appveyor.yml
> artifacts:
>   - arrow-cpp-{version}-py35_vc14_0.tar.bz2
>   - pyarrow-{version}-py35_vc14_0.tar.bz2
>   - ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2845) [Packaging] Upload additional debian artifacts

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2845:
--
Component/s: Packaging

> [Packaging] Upload additional debian artifacts
> --
>
> Key: ARROW-2845
> URL: https://issues.apache.org/jira/browse/ARROW-2845
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Packaging
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We should add the following files to our artifact list:
> {code}
> apache-arrow_{version}-1.debian.tar.xz
> apache-arrow_{version}-1.dsc
> apache-arrow_{version}.orig.tar.gz
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2811) [Python] Test serialization for determinism

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2811:
--
Component/s: Python

> [Python] Test serialization for determinism
> ---
>
> Key: ARROW-2811
> URL: https://issues.apache.org/jira/browse/ARROW-2811
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> see discussion in https://github.com/apache/arrow/pull/2216



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2886) [Release] An unused variable exists

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2886:
--
Component/s: Packaging

> [Release] An unused variable exists
> ---
>
> Key: ARROW-2886
> URL: https://issues.apache.org/jira/browse/ARROW-2886
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2907) [GitHub] Improve "How to contribute patches"

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2907:
--
Component/s: Documentation

> [GitHub] Improve "How to contribute patches"
> 
>
> Key: ARROW-2907
> URL: https://issues.apache.org/jira/browse/ARROW-2907
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: okkez
>Assignee: okkez
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The first paragraph of the document "[How to contribute 
> patches|https://github.com/apache/arrow/blob/master/.github/CONTRIBUTING.md#how-to-contribute-patches];
>  is unclear that contributors need to follow the procedure "To contribute 
> patch".
> I will add a patch for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2890) [Plasma] Make Python PlasmaClient.release private

2018-07-27 Thread Li Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jin updated ARROW-2890:
--
Component/s: Plasma (C++)

> [Plasma] Make Python PlasmaClient.release private
> -
>
> Key: ARROW-2890
> URL: https://issues.apache.org/jira/browse/ARROW-2890
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It should normally not be called by the user, since it is automatically 
> called upon buffer destruction, see also 
> https://github.com/apache/arrow/blob/7d2fbeba31763c978d260a9771184a13a63aaaf7/python/pyarrow/_plasma.pyx#L222.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >