[jira] [Updated] (ARROW-2502) [Rust] Restore Windows Compatibility

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2502:
--
Labels: pull-request-available  (was: )

> [Rust] Restore Windows Compatibility
> 
>
> Key: ARROW-2502
> URL: https://issues.apache.org/jira/browse/ARROW-2502
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Major
>  Labels: pull-request-available
>
> Windows support is currently broken due to a call to free in builder.rs and 
> the memory_pool abstraction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2474) [Rust] Add windows support for memory pool abstraction

2018-04-23 Thread Paddy Horan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan updated ARROW-2474:
---
Summary: [Rust] Add windows support for memory pool abstraction  (was: 
[Rust] Windows build fails for memory pool abstraction)

> [Rust] Add windows support for memory pool abstraction
> --
>
> Key: ARROW-2474
> URL: https://issues.apache.org/jira/browse/ARROW-2474
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2502) [Rust] Restore Windows Compatibility

2018-04-23 Thread Paddy Horan (JIRA)
Paddy Horan created ARROW-2502:
--

 Summary: [Rust] Restore Windows Compatibility
 Key: ARROW-2502
 URL: https://issues.apache.org/jira/browse/ARROW-2502
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan


Windows support is currently broken due to a call to free in builder.rs and the 
memory_pool abstraction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2474) [Rust] Windows build fails for memory pool abstraction

2018-04-23 Thread Paddy Horan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan updated ARROW-2474:
---
Summary: [Rust] Windows build fails for memory pool abstraction  (was: 
[Rust] Windows build fails for memory pool abstract)

> [Rust] Windows build fails for memory pool abstraction
> --
>
> Key: ARROW-2474
> URL: https://issues.apache.org/jira/browse/ARROW-2474
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2474) [Rust] Windows build fails for memory pool abstract

2018-04-23 Thread Paddy Horan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paddy Horan updated ARROW-2474:
---
Summary: [Rust] Windows build fails for memory pool abstract  (was: [Rust] 
Windows build fails in memory pool abstraction)

> [Rust] Windows build fails for memory pool abstract
> ---
>
> Key: ARROW-2474
> URL: https://issues.apache.org/jira/browse/ARROW-2474
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2486) [C++/Python] Provide a Docker image that contains all dependencies for development

2018-04-23 Thread Aneesh (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449117#comment-16449117
 ] 

Aneesh commented on ARROW-2486:
---

I'm potentially interested in working on this (after completing any 
contributions to the build system that I've committed to). If there are any 
suggestions as to base image and/or dependencies in addition to the ones 
documented below, please advise. 
https://arrow.apache.org/docs/python/development.html#developing-on-linux-and-macos

> [C++/Python] Provide a Docker image that contains all dependencies for 
> development
> --
>
> Key: ARROW-2486
> URL: https://issues.apache.org/jira/browse/ARROW-2486
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: hackathon
> Fix For: 0.11.0
>
>
> We should provide docker image and a docker file that contains all necessary 
> dependencies that one needs for development. In addition there should be a 
> Dockerfile that can be used for development where the sources are 
> (bind-)mounted into the container. A typical workflow should consist out of a 
> wrapper script that starts the container, takes care of the bind mounts and 
> runs cmake if necessary.
> People that want to get started with Arrow development on e.g. OS X will 
> spend a long time setting up the environment. I hope this lowers the barrier 
> for a first contribution a bit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2501) Upgrade Jackson to 2.9.5

2018-04-23 Thread Andy Grove (JIRA)
Andy Grove created ARROW-2501:
-

 Summary: Upgrade Jackson to 2.9.5
 Key: ARROW-2501
 URL: https://issues.apache.org/jira/browse/ARROW-2501
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java - Memory, Java - Vectors
Affects Versions: 0.9.0
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.10.0


I would like to upgrade Jackson to the latest version (2.9.5). If there are no 
objections I will create a PR (it is literally just changing the version number 
in the pom - no code changes required).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2498) [Java] Upgrade to JDK 1.8

2018-04-23 Thread Andy Grove (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448866#comment-16448866
 ] 

Andy Grove commented on ARROW-2498:
---

I removed openjdk7 from the CI build matrix and the build for the PR is now 
passing.

> [Java] Upgrade to JDK 1.8
> -
>
> Key: ARROW-2498
> URL: https://issues.apache.org/jira/browse/ARROW-2498
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Memory, Java - Vectors
>Affects Versions: 0.11.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> I'm trying to use the parquet-arrow module from parquet-mr but I'm running 
> into this error because the two projects use different major versions of Java:
> {code:java}
>   Cause: java.lang.ClassNotFoundException: 
> org.apache.arrow.vector.types.pojo.ArrowType$Struct_{code}
> The struct is actually named `Struct` not `Struct_`.
> This PR is to track work to upgrade to JDK 1.8
> I should note that this is after the recent commit in parquet to upgrade to 
> use arrow-0.8.0.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2500) [Java] IPC Writers/readers are not always setting validity bits correctly

2018-04-23 Thread Emilio Lahr-Vivaz (JIRA)
Emilio Lahr-Vivaz created ARROW-2500:


 Summary: [Java] IPC Writers/readers are not always setting 
validity bits correctly
 Key: ARROW-2500
 URL: https://issues.apache.org/jira/browse/ARROW-2500
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java - Vectors
Affects Versions: 0.9.0, 0.8.0
Reporter: Emilio Lahr-Vivaz


When writing multiple batches to a Stream/File Writer, the first validity bit 
can get garbled between writing and reading. I couldn't pinpoint the exact 
issue, but I was able to re-create it with a fairly simple unit test.

in TestArrowStream.java:

{code:java}
  @Test
  public void testReadWriteMultipleBatches() throws IOException {

ByteArrayOutputStream os = new ByteArrayOutputStream();

try (IntVector vector = new IntVector("foo", allocator);) {
  Schema schema = new Schema(Collections.singletonList(vector.getField()), 
null);
  try (VectorSchemaRoot root = new VectorSchemaRoot(schema, 
Collections.singletonList((FieldVector) vector), vector.getValueCount());
   ArrowStreamWriter writer = new ArrowStreamWriter(root, new 
MapDictionaryProvider(), Channels.newChannel(os));) {
writer.start();

vector.setNull(0);
vector.setSafe(1, 1);
vector.setSafe(2, 2);
vector.setNull(3);
vector.setSafe(4, 1);
vector.setValueCount(5);
root.setRowCount(5);
writer.writeBatch();

vector.setNull(0);
vector.setSafe(1, 1);
vector.setSafe(2, 2);
vector.setValueCount(3);
root.setRowCount(3);
writer.writeBatch();
  }
}

ByteArrayInputStream in = new ByteArrayInputStream(os.toByteArray());

try (ArrowStreamReader reader = new ArrowStreamReader(in, allocator);) {
  IntVector read = (IntVector) 
reader.getVectorSchemaRoot().getFieldVectors().get(0);

  reader.loadNextBatch();

  assertEquals(read.getValueCount(), 5);
  assertNull(read.getObject(0));
  assertEquals(read.getObject(1), Integer.valueOf(1));
  assertEquals(read.getObject(2), Integer.valueOf(2));
  assertNull(read.getObject(3));
  assertEquals(read.getObject(4), Integer.valueOf(1));

  reader.loadNextBatch();

  assertEquals(read.getValueCount(), 3);
  assertNull(read.getObject(0));
  assertEquals(read.getObject(1), Integer.valueOf(1));
  assertEquals(read.getObject(2), Integer.valueOf(2));
}
  }
{code}

in TestArrowFile.java:

{code}
 @Test
  public void testReadWriteMultipleBatches() throws IOException {
File file = new File("target/mytest_nulls_multibatch.arrow");

try (IntVector vector = new IntVector("foo", allocator);) {
  Schema schema = new Schema(Collections.singletonList(vector.getField()), 
null);
  try (FileOutputStream fileOutputStream = new FileOutputStream(file);
   VectorSchemaRoot root = new VectorSchemaRoot(schema, 
Collections.singletonList((FieldVector) vector), vector.getValueCount());
   ArrowFileWriter writer = new ArrowFileWriter(root, new 
MapDictionaryProvider(), fileOutputStream.getChannel());) {
writer.start();

vector.setNull(0);
vector.setSafe(1, 1);
vector.setSafe(2, 2);
vector.setNull(3);
vector.setSafe(4, 1);
vector.setValueCount(5);
root.setRowCount(5);
writer.writeBatch();

vector.setNull(0);
vector.setSafe(1, 1);
vector.setSafe(2, 2);
vector.setValueCount(3);
root.setRowCount(3);
writer.writeBatch();
  }
}

try (FileInputStream fileInputStream = new FileInputStream(file);
 ArrowFileReader reader = new 
ArrowFileReader(fileInputStream.getChannel(), allocator);) {
  IntVector read = (IntVector) 
reader.getVectorSchemaRoot().getFieldVectors().get(0);

  reader.loadNextBatch();

  assertEquals(read.getValueCount(), 5);
  assertNull(read.getObject(0));
  assertEquals(read.getObject(1), Integer.valueOf(1));
  assertEquals(read.getObject(2), Integer.valueOf(2));
  assertNull(read.getObject(3));
  assertEquals(read.getObject(4), Integer.valueOf(1));

  reader.loadNextBatch();

  assertEquals(read.getValueCount(), 3);
  assertNull(read.getObject(0));
  assertEquals(read.getObject(1), Integer.valueOf(1));
  assertEquals(read.getObject(2), Integer.valueOf(2));
}
  }
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2470) [C++] FileGetSize() should not seek

2018-04-23 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2470.

   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1934
[https://github.com/apache/arrow/pull/1934]

> [C++] FileGetSize() should not seek
> ---
>
> Key: ARROW-2470
> URL: https://issues.apache.org/jira/browse/ARROW-2470
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> {{FileGetSize()}} currently seeks to the end of file and reads the current 
> file position. Instead it should simply call {{fstat}} on the file descriptor 
> (or the Windows equivalent).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2492) [Python] Prevent segfault on accidental call of pyarrow.Array

2018-04-23 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-2492.

Resolution: Fixed

Issue resolved by pull request 1926
[https://github.com/apache/arrow/pull/1926]

> [Python] Prevent segfault on accidental call of pyarrow.Array
> -
>
> Key: ARROW-2492
> URL: https://issues.apache.org/jira/browse/ARROW-2492
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If you mistype {{pyarrow.Array}} instead of {{pyarrow.array}} you get a 
> segmentation fault on some functions. We should also take care of these 
> segmentation faults but also should prevent the user from calling the 
> constructor again in this fashion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2497) Use ASSERT_NO_FATAIL_FAILURE in C++ unit tests

2018-04-23 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-2497:
--

Assignee: Joshua Storck

> Use ASSERT_NO_FATAIL_FAILURE in C++ unit tests
> --
>
> Key: ARROW-2497
> URL: https://issues.apache.org/jira/browse/ARROW-2497
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Joshua Storck
>Assignee: Joshua Storck
>Priority: Minor
>  Labels: pull-request-available
>
> A number of unit tests have helper functions that use gtest/arrow ASSERT_ 
> macros. Those ASSERT_ macros simply return out of the current context and do 
> not throw exceptions or abort. Since these helper functions return void, the 
> unit test simply continues when the assertions are triggered. This can lead 
> to additional failures, such as segfaults because the test is executing code 
> that it did not expect to. By adding the gtest ASSERT_NO_FATAIL_FAILURE to 
> the calls of those helper functions in the outermost scope of the unit test, 
> the test will correctly terminate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2499) [C++] Add iterator facility for Python sequences

2018-04-23 Thread Antoine Pitrou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448800#comment-16448800
 ] 

Antoine Pitrou commented on ARROW-2499:
---

Thanks! We can start from that indeed.

> [C++] Add iterator facility for Python sequences
> 
>
> Key: ARROW-2499
> URL: https://issues.apache.org/jira/browse/ARROW-2499
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>
> The idea is to factor out something like the following:
> https://github.com/apache/arrow/pull/1935/files#diff-6ea0fcd65b95b76eab9ddfbd7a173725R78
> However I'm not sure which idiom or pattern we should choose. [~cpcloud] any 
> idea?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2499) [C++] Add iterator facility for Python sequences

2018-04-23 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-2499:
-

Assignee: Antoine Pitrou

> [C++] Add iterator facility for Python sequences
> 
>
> Key: ARROW-2499
> URL: https://issues.apache.org/jira/browse/ARROW-2499
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>
> The idea is to factor out something like the following:
> https://github.com/apache/arrow/pull/1935/files#diff-6ea0fcd65b95b76eab9ddfbd7a173725R78
> However I'm not sure which idiom or pattern we should choose. [~cpcloud] any 
> idea?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2499) [C++] Add iterator facility for Python sequences

2018-04-23 Thread Phillip Cloud (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448799#comment-16448799
 ] 

Phillip Cloud commented on ARROW-2499:
--

[~pitrou] Looks like there's {{LoopPySequence}} in {{numpy_to_arrow.cc}}, if 
that satisfies the need.

> [C++] Add iterator facility for Python sequences
> 
>
> Key: ARROW-2499
> URL: https://issues.apache.org/jira/browse/ARROW-2499
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>
> The idea is to factor out something like the following:
> https://github.com/apache/arrow/pull/1935/files#diff-6ea0fcd65b95b76eab9ddfbd7a173725R78
> However I'm not sure which idiom or pattern we should choose. [~cpcloud] any 
> idea?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2499) [C++] Add iterator facility for Python sequences

2018-04-23 Thread Phillip Cloud (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448795#comment-16448795
 ] 

Phillip Cloud commented on ARROW-2499:
--

That works well too.

> [C++] Add iterator facility for Python sequences
> 
>
> Key: ARROW-2499
> URL: https://issues.apache.org/jira/browse/ARROW-2499
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>
> The idea is to factor out something like the following:
> https://github.com/apache/arrow/pull/1935/files#diff-6ea0fcd65b95b76eab9ddfbd7a173725R78
> However I'm not sure which idiom or pattern we should choose. [~cpcloud] any 
> idea?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2499) [C++] Add iterator facility for Python sequences

2018-04-23 Thread Antoine Pitrou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448790#comment-16448790
 ] 

Antoine Pitrou commented on ARROW-2499:
---

That would be nice. Otherwise I might end up writing a {{for_each}}-like 
primitive, something like:

{code:cpp}
// Iterate on the Python sequence, calling the given callable on each element.
// If the callable returns a non-OK status, iteration stops and the status is 
returned.
Status IterateSequence(PyObject*, const std::function&);
{code}

> [C++] Add iterator facility for Python sequences
> 
>
> Key: ARROW-2499
> URL: https://issues.apache.org/jira/browse/ARROW-2499
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>
> The idea is to factor out something like the following:
> https://github.com/apache/arrow/pull/1935/files#diff-6ea0fcd65b95b76eab9ddfbd7a173725R78
> However I'm not sure which idiom or pattern we should choose. [~cpcloud] any 
> idea?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2499) [C++] Add iterator facility for Python sequences

2018-04-23 Thread Phillip Cloud (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448786#comment-16448786
 ] 

Phillip Cloud commented on ARROW-2499:
--

I can take a crack at this to show you what I mean (my suggestion may also not 
work out, there are still some things about idiomatic C++ that I don't yet 
grok). [~joshuastorck] might also have suggestions here.

> [C++] Add iterator facility for Python sequences
> 
>
> Key: ARROW-2499
> URL: https://issues.apache.org/jira/browse/ARROW-2499
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>
> The idea is to factor out something like the following:
> https://github.com/apache/arrow/pull/1935/files#diff-6ea0fcd65b95b76eab9ddfbd7a173725R78
> However I'm not sure which idiom or pattern we should choose. [~cpcloud] any 
> idea?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2499) [C++] Add iterator facility for Python sequences

2018-04-23 Thread Antoine Pitrou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448780#comment-16448780
 ] 

Antoine Pitrou commented on ARROW-2499:
---

{{generic_iterator}} is a template class in pybind11, so I'm not sure how that 
works.

> [C++] Add iterator facility for Python sequences
> 
>
> Key: ARROW-2499
> URL: https://issues.apache.org/jira/browse/ARROW-2499
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>
> The idea is to factor out something like the following:
> https://github.com/apache/arrow/pull/1935/files#diff-6ea0fcd65b95b76eab9ddfbd7a173725R78
> However I'm not sure which idiom or pattern we should choose. [~cpcloud] any 
> idea?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2499) [C++] Add iterator facility for Python sequences

2018-04-23 Thread Phillip Cloud (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448767#comment-16448767
 ] 

Phillip Cloud commented on ARROW-2499:
--

You'd still have to do the checks and choose the right _specific_ iterator but 
in theory you'd be able to have a single {{VisitSequence(const 
generic_iterator& iter)}} and loop over the elements using range-based for 
loops, which gets rid of the loop duplication. I'm not sure how to get rid of 
the checks since we only have {{PyObject*}} at the call site.

> [C++] Add iterator facility for Python sequences
> 
>
> Key: ARROW-2499
> URL: https://issues.apache.org/jira/browse/ARROW-2499
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>
> The idea is to factor out something like the following:
> https://github.com/apache/arrow/pull/1935/files#diff-6ea0fcd65b95b76eab9ddfbd7a173725R78
> However I'm not sure which idiom or pattern we should choose. [~cpcloud] any 
> idea?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2499) [C++] Add iterator facility for Python sequences

2018-04-23 Thread Antoine Pitrou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448756#comment-16448756
 ] 

Antoine Pitrou commented on ARROW-2499:
---

That looks nice, but I would like to know how to abstract away the fact that we 
need two separate loops for performance in the example above. pybind11 
basically has us write the checks and separate loops by hands. Perhaps a 
`for_each` like facility? But what should the signature look like?

Also, should we take the pybind11 code as-is or write our own?

> [C++] Add iterator facility for Python sequences
> 
>
> Key: ARROW-2499
> URL: https://issues.apache.org/jira/browse/ARROW-2499
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>
> The idea is to factor out something like the following:
> https://github.com/apache/arrow/pull/1935/files#diff-6ea0fcd65b95b76eab9ddfbd7a173725R78
> However I'm not sure which idiom or pattern we should choose. [~cpcloud] any 
> idea?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2499) [C++] Add iterator facility for Python sequences

2018-04-23 Thread Phillip Cloud (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448739#comment-16448739
 ] 

Phillip Cloud commented on ARROW-2499:
--

pybind11 actually has a nice implementation of C++ iterators on the core python 
sequence types 
(https://github.com/pybind/pybind11/blob/master/include/pybind11/pytypes.h#L559-L682).
 Maybe we take some ideas from there?

> [C++] Add iterator facility for Python sequences
> 
>
> Key: ARROW-2499
> URL: https://issues.apache.org/jira/browse/ARROW-2499
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>
> The idea is to factor out something like the following:
> https://github.com/apache/arrow/pull/1935/files#diff-6ea0fcd65b95b76eab9ddfbd7a173725R78
> However I'm not sure which idiom or pattern we should choose. [~cpcloud] any 
> idea?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2498) [Java] Upgrade to JDK 1.8

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448698#comment-16448698
 ] 

ASF GitHub Bot commented on ARROW-2498:
---

BryanCutler commented on issue #1936: ARROW-2498: [Java] Use java 1.8 instead 
of java 1.7
URL: https://github.com/apache/arrow/pull/1936#issuecomment-383682574
 
 
   Are we going to change Travis to just test Java 1.8 too?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Upgrade to JDK 1.8
> -
>
> Key: ARROW-2498
> URL: https://issues.apache.org/jira/browse/ARROW-2498
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Memory, Java - Vectors
>Affects Versions: 0.11.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> I'm trying to use the parquet-arrow module from parquet-mr but I'm running 
> into this error because the two projects use different major versions of Java:
> {code:java}
>   Cause: java.lang.ClassNotFoundException: 
> org.apache.arrow.vector.types.pojo.ArrowType$Struct_{code}
> The struct is actually named `Struct` not `Struct_`.
> This PR is to track work to upgrade to JDK 1.8
> I should note that this is after the recent commit in parquet to upgrade to 
> use arrow-0.8.0.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2055) [Java] Upgrade to Java 8

2018-04-23 Thread Andy Grove (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448685#comment-16448685
 ] 

Andy Grove commented on ARROW-2055:
---

PR: [https://github.com/apache/arrow/pull/1936]

 

I had already posted this to the duplicate Jira that I created: 
https://issues.apache.org/jira/browse/ARROW-2498

> [Java] Upgrade to Java 8
> 
>
> Key: ARROW-2055
> URL: https://issues.apache.org/jira/browse/ARROW-2055
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Li Jin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2499) [C++] Add iterator facility for Python sequences

2018-04-23 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2499:
-

 Summary: [C++] Add iterator facility for Python sequences
 Key: ARROW-2499
 URL: https://issues.apache.org/jira/browse/ARROW-2499
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++
Reporter: Antoine Pitrou


The idea is to factor out something like the following:
https://github.com/apache/arrow/pull/1935/files#diff-6ea0fcd65b95b76eab9ddfbd7a173725R78

However I'm not sure which idiom or pattern we should choose. [~cpcloud] any 
idea?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2478) [C++] Introduce a checked_cast function that performs a dynamic_cast in debug mode

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448609#comment-16448609
 ] 

ASF GitHub Bot commented on ARROW-2478:
---

cpcloud commented on a change in pull request #1937: ARROW-2478: [C++] 
Introduce a checked_cast function that performs a dynamic_cast in debug mode
URL: https://github.com/apache/arrow/pull/1937#discussion_r183486772
 
 

 ##
 File path: cpp/src/arrow/util/checked-cast-test.cc
 ##
 @@ -0,0 +1,40 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include 
+
+#include 
+
+#include "arrow/util/checked_cast.h"
+
+namespace arrow {
+
+TEST(CheckedCast, TestInvalidSubclassCast) {
+  class Foo {
+   public:
+virtual ~Foo() = default;
+  };
+  class Bar {};
+
+  static_assert(std::is_polymorphic::value, "Foo is not polymorphic");
+
+  Foo foo;
+  Foo* fooptr = 
+  ASSERT_EQ(nullptr, checked_cast(fooptr));
 
 Review comment:
   Yes, actually it will fail at compile time. Let me fix


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Introduce a checked_cast function that performs a dynamic_cast in debug 
> mode
> --
>
> Key: ARROW-2478
> URL: https://issues.apache.org/jira/browse/ARROW-2478
> Project: Apache Arrow
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> This would use {{static_cast}} in release mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2478) [C++] Introduce a checked_cast function that performs a dynamic_cast in debug mode

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448608#comment-16448608
 ] 

ASF GitHub Bot commented on ARROW-2478:
---

pitrou commented on a change in pull request #1937: ARROW-2478: [C++] Introduce 
a checked_cast function that performs a dynamic_cast in debug mode
URL: https://github.com/apache/arrow/pull/1937#discussion_r183486637
 
 

 ##
 File path: cpp/src/arrow/util/checked-cast-test.cc
 ##
 @@ -0,0 +1,40 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include 
+
+#include 
+
+#include "arrow/util/checked_cast.h"
+
+namespace arrow {
+
+TEST(CheckedCast, TestInvalidSubclassCast) {
+  class Foo {
+   public:
+virtual ~Foo() = default;
+  };
+  class Bar {};
+
+  static_assert(std::is_polymorphic::value, "Foo is not polymorphic");
+
+  Foo foo;
+  Foo* fooptr = 
+  ASSERT_EQ(nullptr, checked_cast(fooptr));
 
 Review comment:
   (also I think the test would be more interesting if there was a inheritance 
relationship between Foo and Bar, but perhaps it doesn't change anything).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Introduce a checked_cast function that performs a dynamic_cast in debug 
> mode
> --
>
> Key: ARROW-2478
> URL: https://issues.apache.org/jira/browse/ARROW-2478
> Project: Apache Arrow
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> This would use {{static_cast}} in release mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2478) [C++] Introduce a checked_cast function that performs a dynamic_cast in debug mode

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448606#comment-16448606
 ] 

ASF GitHub Bot commented on ARROW-2478:
---

pitrou commented on a change in pull request #1937: ARROW-2478: [C++] Introduce 
a checked_cast function that performs a dynamic_cast in debug mode
URL: https://github.com/apache/arrow/pull/1937#discussion_r183486366
 
 

 ##
 File path: cpp/src/arrow/util/checked-cast-test.cc
 ##
 @@ -0,0 +1,40 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include 
+
+#include 
+
+#include "arrow/util/checked_cast.h"
+
+namespace arrow {
+
+TEST(CheckedCast, TestInvalidSubclassCast) {
+  class Foo {
+   public:
+virtual ~Foo() = default;
+  };
+  class Bar {};
+
+  static_assert(std::is_polymorphic::value, "Foo is not polymorphic");
+
+  Foo foo;
+  Foo* fooptr = 
+  ASSERT_EQ(nullptr, checked_cast(fooptr));
 
 Review comment:
   Won't this test fail in non-debug mode?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Introduce a checked_cast function that performs a dynamic_cast in debug 
> mode
> --
>
> Key: ARROW-2478
> URL: https://issues.apache.org/jira/browse/ARROW-2478
> Project: Apache Arrow
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> This would use {{static_cast}} in release mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2478) [C++] Introduce a checked_cast function that performs a dynamic_cast in debug mode

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448604#comment-16448604
 ] 

ASF GitHub Bot commented on ARROW-2478:
---

pitrou commented on a change in pull request #1937: ARROW-2478: [C++] Introduce 
a checked_cast function that performs a dynamic_cast in debug mode
URL: https://github.com/apache/arrow/pull/1937#discussion_r183486305
 
 

 ##
 File path: cpp/src/arrow/util/checked_cast.h
 ##
 @@ -0,0 +1,42 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef ARROW_CAST_H
+#define ARROW_CAST_H
+
+#include 
+
+namespace arrow {
+
+template 
+OutputType checked_cast(InputType&& value) {
+  static_assert(std::is_class::type>::type>::value,
+"checked_cast input type must be a class");
+  static_assert(std::is_class::type>::type>::value,
+"checked_cast output type must be a class");
+#ifdef NDEBUG
+  return static_cast(value);
+#else
+  return dynamic_cast(value);
 
 Review comment:
   Indeed if people are building in debug mode they probably want the extra 
safety checks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Introduce a checked_cast function that performs a dynamic_cast in debug 
> mode
> --
>
> Key: ARROW-2478
> URL: https://issues.apache.org/jira/browse/ARROW-2478
> Project: Apache Arrow
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> This would use {{static_cast}} in release mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2478) [C++] Introduce a checked_cast function that performs a dynamic_cast in debug mode

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448592#comment-16448592
 ] 

ASF GitHub Bot commented on ARROW-2478:
---

cpcloud commented on a change in pull request #1937: ARROW-2478: [C++] 
Introduce a checked_cast function that performs a dynamic_cast in debug mode
URL: https://github.com/apache/arrow/pull/1937#discussion_r183484807
 
 

 ##
 File path: cpp/src/arrow/util/checked_cast.h
 ##
 @@ -0,0 +1,42 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef ARROW_CAST_H
+#define ARROW_CAST_H
+
+#include 
+
+namespace arrow {
+
+template 
+OutputType checked_cast(InputType&& value) {
+  static_assert(std::is_class::type>::type>::value,
+"checked_cast input type must be a class");
+  static_assert(std::is_class::type>::type>::value,
+"checked_cast output type must be a class");
+#ifdef NDEBUG
+  return static_cast(value);
+#else
+  return dynamic_cast(value);
 
 Review comment:
   Hm, that's kind of the point since that would mean that someone is building 
a debug version of whatever they happen to be building. Are we trying to allow 
for downstream developers that do not define `NDEBUG` and expect to get 
non-debug builds for the dependencies they are building themselves?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Introduce a checked_cast function that performs a dynamic_cast in debug 
> mode
> --
>
> Key: ARROW-2478
> URL: https://issues.apache.org/jira/browse/ARROW-2478
> Project: Apache Arrow
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> This would use {{static_cast}} in release mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2478) [C++] Introduce a checked_cast function that performs a dynamic_cast in debug mode

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448589#comment-16448589
 ] 

ASF GitHub Bot commented on ARROW-2478:
---

cpcloud commented on a change in pull request #1937: ARROW-2478: [C++] 
Introduce a checked_cast function that performs a dynamic_cast in debug mode
URL: https://github.com/apache/arrow/pull/1937#discussion_r183484807
 
 

 ##
 File path: cpp/src/arrow/util/checked_cast.h
 ##
 @@ -0,0 +1,42 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef ARROW_CAST_H
+#define ARROW_CAST_H
+
+#include 
+
+namespace arrow {
+
+template 
+OutputType checked_cast(InputType&& value) {
+  static_assert(std::is_class::type>::type>::value,
+"checked_cast input type must be a class");
+  static_assert(std::is_class::type>::type>::value,
+"checked_cast output type must be a class");
+#ifdef NDEBUG
+  return static_cast(value);
+#else
+  return dynamic_cast(value);
 
 Review comment:
   Hm, that's kind of the point since that would mean that someone is building 
a debug version of arrow. Are we trying to allow for downstream developers that 
do not define `NDEBUG` and expect to get non-debug builds for the dependencies 
they are building themselves?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Introduce a checked_cast function that performs a dynamic_cast in debug 
> mode
> --
>
> Key: ARROW-2478
> URL: https://issues.apache.org/jira/browse/ARROW-2478
> Project: Apache Arrow
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> This would use {{static_cast}} in release mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2452) [TEST] Spark integration test fails with permission error

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448573#comment-16448573
 ] 

ASF GitHub Bot commented on ARROW-2452:
---

BryanCutler commented on issue #1890: ARROW-2452: [TEST] Spark integration test 
fails with permission error
URL: https://github.com/apache/arrow/pull/1890#issuecomment-383663693
 
 
   Seems ok to me.  I don't normally run Spark integration tests through Docker 
compose, what exactly is the advantage of doing that?
   
   It looks like the first part of the issue stated in the JIRA was in building 
Arrow Cpp, could you clarify why the Spark build failed?  I can't tell from the 
output above, but I'm not sure why this would cause the Spark Java build to 
fail.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [TEST] Spark integration test fails with permission error
> -
>
> Key: ARROW-2452
> URL: https://issues.apache.org/jira/browse/ARROW-2452
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> {{arrow/dev/run_docker_compose.sh spark_integration}}
> {code}
> Scanning dependencies of target lib
> [ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o
> [100%] Linking CXX shared module release/lib.so
> [100%] Built target lib
> -- Finished cmake --build for pyarrow
> Bundling includes: release/include
> ('Moving built C-extension', 'release/lib.so', 'to build path', 
> '/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so')
> release/_parquet.so
> Cython module _parquet failure permitted
> release/_orc.so
> Cython module _orc failure permitted
> release/plasma.so
> Cython module plasma failure permitted
> running install
> error: can't create or remove files in install directory
> The following error occurred while trying to add or remove files in the
> installation directory:
> [Errno 13] Permission denied: 
> '/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test'
> The installation directory you specified (via --install-dir, --prefix, or
> the distutils default setting) was:
> /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/
> Perhaps your account does not have write access to this directory?  If the
> installation directory is a system-owned directory, you may need to sign in
> as the administrator or "root" account.  If you do not have administrative
> access to this machine, you may wish to choose a different installation
> directory, preferably one that is listed in your PYTHONPATH environment
> variable.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2478) [C++] Introduce a checked_cast function that performs a dynamic_cast in debug mode

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448572#comment-16448572
 ] 

ASF GitHub Bot commented on ARROW-2478:
---

cpcloud commented on a change in pull request #1937: ARROW-2478: [C++] 
Introduce a checked_cast function that performs a dynamic_cast in debug mode
URL: https://github.com/apache/arrow/pull/1937#discussion_r183482165
 
 

 ##
 File path: cpp/src/arrow/util/checked_cast.h
 ##
 @@ -0,0 +1,42 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef ARROW_CAST_H
+#define ARROW_CAST_H
+
+#include 
+
+namespace arrow {
+
+template 
+OutputType checked_cast(InputType&& value) {
 
 Review comment:
   I doubt that will have any effect, but I'll add.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Introduce a checked_cast function that performs a dynamic_cast in debug 
> mode
> --
>
> Key: ARROW-2478
> URL: https://issues.apache.org/jira/browse/ARROW-2478
> Project: Apache Arrow
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> This would use {{static_cast}} in release mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2478) [C++] Introduce a checked_cast function that performs a dynamic_cast in debug mode

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448557#comment-16448557
 ] 

ASF GitHub Bot commented on ARROW-2478:
---

wesm commented on a change in pull request #1937: ARROW-2478: [C++] Introduce a 
checked_cast function that performs a dynamic_cast in debug mode
URL: https://github.com/apache/arrow/pull/1937#discussion_r183479526
 
 

 ##
 File path: cpp/src/arrow/util/checked_cast.h
 ##
 @@ -0,0 +1,42 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef ARROW_CAST_H
+#define ARROW_CAST_H
+
+#include 
+
+namespace arrow {
+
+template 
+OutputType checked_cast(InputType&& value) {
 
 Review comment:
   `inline`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Introduce a checked_cast function that performs a dynamic_cast in debug 
> mode
> --
>
> Key: ARROW-2478
> URL: https://issues.apache.org/jira/browse/ARROW-2478
> Project: Apache Arrow
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> This would use {{static_cast}} in release mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2478) [C++] Introduce a checked_cast function that performs a dynamic_cast in debug mode

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448556#comment-16448556
 ] 

ASF GitHub Bot commented on ARROW-2478:
---

wesm commented on a change in pull request #1937: ARROW-2478: [C++] Introduce a 
checked_cast function that performs a dynamic_cast in debug mode
URL: https://github.com/apache/arrow/pull/1937#discussion_r183479967
 
 

 ##
 File path: cpp/src/arrow/util/checked_cast.h
 ##
 @@ -0,0 +1,42 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef ARROW_CAST_H
+#define ARROW_CAST_H
+
+#include 
+
+namespace arrow {
+
+template 
+OutputType checked_cast(InputType&& value) {
+  static_assert(std::is_class::type>::type>::value,
+"checked_cast input type must be a class");
+  static_assert(std::is_class::type>::type>::value,
+"checked_cast output type must be a class");
+#ifdef NDEBUG
+  return static_cast(value);
+#else
+  return dynamic_cast(value);
 
 Review comment:
   This header leaks in the public API, so if downstream users do not define 
NDEBUG then they will have dynamic_casts wherever this is used


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Introduce a checked_cast function that performs a dynamic_cast in debug 
> mode
> --
>
> Key: ARROW-2478
> URL: https://issues.apache.org/jira/browse/ARROW-2478
> Project: Apache Arrow
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> This would use {{static_cast}} in release mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2478) [C++] Introduce a checked_cast function that performs a dynamic_cast in debug mode

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2478:
--
Labels: pull-request-available  (was: )

> [C++] Introduce a checked_cast function that performs a dynamic_cast in debug 
> mode
> --
>
> Key: ARROW-2478
> URL: https://issues.apache.org/jira/browse/ARROW-2478
> Project: Apache Arrow
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> This would use {{static_cast}} in release mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2498) [Java] Upgrade to JDK 1.8

2018-04-23 Thread Andy Grove (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-2498:
--
Description: 
I'm trying to use the parquet-arrow module from parquet-mr but I'm running into 
this error because the two projects use different major versions of Java:
{code:java}
  Cause: java.lang.ClassNotFoundException: 
org.apache.arrow.vector.types.pojo.ArrowType$Struct_{code}
The struct is actually named `Struct` not `Struct_`.

This PR is to track work to upgrade to JDK 1.8

I should note that this is after the recent commit in parquet to upgrade to use 
arrow-0.8.0.

 

 

  was:
I'm trying to use the parquet-arrow module from parquet-mr but I'm running into 
this error which I'm pretty sure is because the two projects use different 
major versions of Java:
{code:java}
  Cause: java.lang.ClassNotFoundException: 
org.apache.arrow.vector.types.pojo.ArrowType$Struct_{code}
The struct is actually named `Struct` not `Struct_`.

This PR is to track work to upgrade to JDK 1.8

I should note that this is after the recent commit in parquet to upgrade to use 
arrow-0.8.0.

 

 


> [Java] Upgrade to JDK 1.8
> -
>
> Key: ARROW-2498
> URL: https://issues.apache.org/jira/browse/ARROW-2498
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Memory, Java - Vectors
>Affects Versions: 0.11.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> I'm trying to use the parquet-arrow module from parquet-mr but I'm running 
> into this error because the two projects use different major versions of Java:
> {code:java}
>   Cause: java.lang.ClassNotFoundException: 
> org.apache.arrow.vector.types.pojo.ArrowType$Struct_{code}
> The struct is actually named `Struct` not `Struct_`.
> This PR is to track work to upgrade to JDK 1.8
> I should note that this is after the recent commit in parquet to upgrade to 
> use arrow-0.8.0.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2498) [Java] Upgrade to JDK 1.8

2018-04-23 Thread Li Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448516#comment-16448516
 ] 

Li Jin commented on ARROW-2498:
---

[~jnadeau] what do you think about dropping Java 7 support and possibly 
replacing Joda time with Java 8 time in 1.0 release?

> [Java] Upgrade to JDK 1.8
> -
>
> Key: ARROW-2498
> URL: https://issues.apache.org/jira/browse/ARROW-2498
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Memory, Java - Vectors
>Affects Versions: 0.11.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> I'm trying to use the parquet-arrow module from parquet-mr but I'm running 
> into this error which I'm pretty sure is because the two projects use 
> different major versions of Java:
> {code:java}
>   Cause: java.lang.ClassNotFoundException: 
> org.apache.arrow.vector.types.pojo.ArrowType$Struct_{code}
> The struct is actually named `Struct` not `Struct_`.
> This PR is to track work to upgrade to JDK 1.8
> I should note that this is after the recent commit in parquet to upgrade to 
> use arrow-0.8.0.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2498) [Java] Upgrade to JDK 1.8

2018-04-23 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448502#comment-16448502
 ] 

Jacques Nadeau commented on ARROW-2498:
---

Sounds good to me. Let's flip the switch :)

> [Java] Upgrade to JDK 1.8
> -
>
> Key: ARROW-2498
> URL: https://issues.apache.org/jira/browse/ARROW-2498
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Memory, Java - Vectors
>Affects Versions: 0.11.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> I'm trying to use the parquet-arrow module from parquet-mr but I'm running 
> into this error which I'm pretty sure is because the two projects use 
> different major versions of Java:
> {code:java}
>   Cause: java.lang.ClassNotFoundException: 
> org.apache.arrow.vector.types.pojo.ArrowType$Struct_{code}
> The struct is actually named `Struct` not `Struct_`.
> This PR is to track work to upgrade to JDK 1.8
> I should note that this is after the recent commit in parquet to upgrade to 
> use arrow-0.8.0.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2498) [Java] Upgrade to JDK 1.8

2018-04-23 Thread Li Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448483#comment-16448483
 ] 

Li Jin commented on ARROW-2498:
---

Andy there is already a Jira tracking this:

https://issues.apache.org/jira/browse/ARROW-2055

I think it already works, but probably need [~jacq...@dremio.com]'s on the 
input whether we can "flip the switch"

> [Java] Upgrade to JDK 1.8
> -
>
> Key: ARROW-2498
> URL: https://issues.apache.org/jira/browse/ARROW-2498
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Memory, Java - Vectors
>Affects Versions: 0.11.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> I'm trying to use the parquet-arrow module from parquet-mr but I'm running 
> into this error which I'm pretty sure is because the two projects use 
> different major versions of Java:
> {code:java}
>   Cause: java.lang.ClassNotFoundException: 
> org.apache.arrow.vector.types.pojo.ArrowType$Struct_{code}
> The struct is actually named `Struct` not `Struct_`.
> This PR is to track work to upgrade to JDK 1.8
> I should note that this is after the recent commit in parquet to upgrade to 
> use arrow-0.8.0.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2498) [Java] Upgrade to JDK 1.8

2018-04-23 Thread Andy Grove (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448429#comment-16448429
 ] 

Andy Grove commented on ARROW-2498:
---

PR: https://github.com/apache/arrow/pull/1936

> [Java] Upgrade to JDK 1.8
> -
>
> Key: ARROW-2498
> URL: https://issues.apache.org/jira/browse/ARROW-2498
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Memory, Java - Vectors
>Affects Versions: 0.11.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> I'm trying to use the parquet-arrow module from parquet-mr but I'm running 
> into this error which I'm pretty sure is because the two projects use 
> different major versions of Java:
> {code:java}
>   Cause: java.lang.ClassNotFoundException: 
> org.apache.arrow.vector.types.pojo.ArrowType$Struct_{code}
> The struct is actually named `Struct` not `Struct_`.
> This PR is to track work to upgrade to JDK 1.8
> I should note that this is after the recent commit in parquet to upgrade to 
> use arrow-0.8.0.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2498) [Java] Upgrade to JDK 1.8

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2498:
--
Labels: pull-request-available  (was: )

> [Java] Upgrade to JDK 1.8
> -
>
> Key: ARROW-2498
> URL: https://issues.apache.org/jira/browse/ARROW-2498
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Memory, Java - Vectors
>Affects Versions: 0.11.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> I'm trying to use the parquet-arrow module from parquet-mr but I'm running 
> into this error which I'm pretty sure is because the two projects use 
> different major versions of Java:
> {code:java}
>   Cause: java.lang.ClassNotFoundException: 
> org.apache.arrow.vector.types.pojo.ArrowType$Struct_{code}
> The struct is actually named `Struct` not `Struct_`.
> This PR is to track work to upgrade to JDK 1.8
> I should note that this is after the recent commit in parquet to upgrade to 
> use arrow-0.8.0.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2498) [Java] Upgrade to JDK 1.8

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448428#comment-16448428
 ] 

ASF GitHub Bot commented on ARROW-2498:
---

agrove-rms opened a new pull request #1936: ARROW-2498: [Java] Use java 1.8 
instead of java 1.7
URL: https://github.com/apache/arrow/pull/1936
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Upgrade to JDK 1.8
> -
>
> Key: ARROW-2498
> URL: https://issues.apache.org/jira/browse/ARROW-2498
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Memory, Java - Vectors
>Affects Versions: 0.11.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> I'm trying to use the parquet-arrow module from parquet-mr but I'm running 
> into this error which I'm pretty sure is because the two projects use 
> different major versions of Java:
> {code:java}
>   Cause: java.lang.ClassNotFoundException: 
> org.apache.arrow.vector.types.pojo.ArrowType$Struct_{code}
> The struct is actually named `Struct` not `Struct_`.
> This PR is to track work to upgrade to JDK 1.8
> I should note that this is after the recent commit in parquet to upgrade to 
> use arrow-0.8.0.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2498) [Java] Upgrade to JDK 1.8

2018-04-23 Thread Andy Grove (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-2498:
--
Description: 
I'm trying to use the parquet-arrow module from parquet-mr but I'm running into 
this error which I'm pretty sure is because the two projects use different 
major versions of Java:
{code:java}
  Cause: java.lang.ClassNotFoundException: 
org.apache.arrow.vector.types.pojo.ArrowType$Struct_{code}
The struct is actually named `Struct` not `Struct_`.

This PR is to track work to upgrade to JDK 1.8

I should note that this is after the recent commit in parquet to upgrade to use 
arrow-0.8.0.

 

 

  was:
I'm trying to use the parquet-arrow module from parquet-mr but I'm running into 
this error which I'm pretty sure is because the two projects use different 
major versions of Java:
{code:java}
  Cause: java.lang.ClassNotFoundException: 
org.apache.arrow.vector.types.pojo.ArrowType$Struct_{code}
The struct is actually named `Struct` not `Struct_`.

This PR is to track work to upgrade to JDK 1.8

 

 


> [Java] Upgrade to JDK 1.8
> -
>
> Key: ARROW-2498
> URL: https://issues.apache.org/jira/browse/ARROW-2498
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Memory, Java - Vectors
>Affects Versions: 0.11.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 0.11.0
>
>
> I'm trying to use the parquet-arrow module from parquet-mr but I'm running 
> into this error which I'm pretty sure is because the two projects use 
> different major versions of Java:
> {code:java}
>   Cause: java.lang.ClassNotFoundException: 
> org.apache.arrow.vector.types.pojo.ArrowType$Struct_{code}
> The struct is actually named `Struct` not `Struct_`.
> This PR is to track work to upgrade to JDK 1.8
> I should note that this is after the recent commit in parquet to upgrade to 
> use arrow-0.8.0.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2498) [Java] Upgrade to JDK 1.8

2018-04-23 Thread Andy Grove (JIRA)
Andy Grove created ARROW-2498:
-

 Summary: [Java] Upgrade to JDK 1.8
 Key: ARROW-2498
 URL: https://issues.apache.org/jira/browse/ARROW-2498
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java - Memory, Java - Vectors
Affects Versions: 0.11.0
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 0.11.0


I'm trying to use the parquet-arrow module from parquet-mr but I'm running into 
this error which I'm pretty sure is because the two projects use different 
major versions of Java:
{code:java}
  Cause: java.lang.ClassNotFoundException: 
org.apache.arrow.vector.types.pojo.ArrowType$Struct_{code}
The struct is actually named `Struct` not `Struct_`.

This PR is to track work to upgrade to JDK 1.8

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2494) Return status codes from PlasmaClient::Seal

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448362#comment-16448362
 ] 

ASF GitHub Bot commented on ARROW-2494:
---

pitrou closed pull request #1932: ARROW-2494: [C++] Return status codes from 
PlasmaClient::Seal instead of crashing
URL: https://github.com/apache/arrow/pull/1932
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/cpp/src/arrow/status.h b/cpp/src/arrow/status.h
index ed524ae6c..8b82e2ab7 100644
--- a/cpp/src/arrow/status.h
+++ b/cpp/src/arrow/status.h
@@ -80,7 +80,8 @@ enum class StatusCode : char {
   PythonError = 12,
   PlasmaObjectExists = 20,
   PlasmaObjectNonexistent = 21,
-  PlasmaStoreFull = 22
+  PlasmaStoreFull = 22,
+  PlasmaObjectAlreadySealed = 23,
 };
 
 #if defined(__clang__)
@@ -144,6 +145,10 @@ class ARROW_EXPORT Status {
 return Status(StatusCode::PlasmaObjectNonexistent, msg);
   }
 
+  static Status PlasmaObjectAlreadySealed(const std::string& msg) {
+return Status(StatusCode::PlasmaObjectAlreadySealed, msg);
+  }
+
   static Status PlasmaStoreFull(const std::string& msg) {
 return Status(StatusCode::PlasmaStoreFull, msg);
   }
@@ -168,6 +173,10 @@ class ARROW_EXPORT Status {
   bool IsPlasmaObjectNonexistent() const {
 return code() == StatusCode::PlasmaObjectNonexistent;
   }
+  // An already sealed object is tried to be sealed again.
+  bool IsPlasmaObjectAlreadySealed() const {
+return code() == StatusCode::PlasmaObjectAlreadySealed;
+  }
   // An object is too large to fit into the plasma store.
   bool IsPlasmaStoreFull() const { return code() == 
StatusCode::PlasmaStoreFull; }
 
diff --git a/cpp/src/plasma/client.cc b/cpp/src/plasma/client.cc
index 0d44b1135..015c9731a 100644
--- a/cpp/src/plasma/client.cc
+++ b/cpp/src/plasma/client.cc
@@ -604,10 +604,15 @@ Status PlasmaClient::Seal(const ObjectID& object_id) {
   // Make sure this client has a reference to the object before sending the
   // request to Plasma.
   auto object_entry = objects_in_use_.find(object_id);
-  ARROW_CHECK(object_entry != objects_in_use_.end())
-  << "Plasma client called seal an object without a reference to it";
-  ARROW_CHECK(!object_entry->second->is_sealed)
-  << "Plasma client called seal an already sealed object";
+
+  if (object_entry == objects_in_use_.end()) {
+return Status::PlasmaObjectNonexistent(
+"Seal() called on an object without a reference to it");
+  }
+  if (object_entry->second->is_sealed) {
+return Status::PlasmaObjectAlreadySealed("Seal() called on an already 
sealed object");
+  }
+
   object_entry->second->is_sealed = true;
   /// Send the seal request to Plasma.
   static unsigned char digest[kDigestSize];
diff --git a/cpp/src/plasma/test/client_tests.cc 
b/cpp/src/plasma/test/client_tests.cc
index 10e4e4f64..dad7688ba 100644
--- a/cpp/src/plasma/test/client_tests.cc
+++ b/cpp/src/plasma/test/client_tests.cc
@@ -90,12 +90,27 @@ class TestPlasmaStore : public ::testing::Test {
   PlasmaClient client2_;
 };
 
+TEST_F(TestPlasmaStore, SealErrorsTest) {
+  ObjectID object_id = ObjectID::from_random();
+
+  Status result = client_.Seal(object_id);
+  ASSERT_TRUE(result.IsPlasmaObjectNonexistent());
+
+  // Create object.
+  std::vector data(100, 0);
+  CreateObject(client_, object_id, {42}, data);
+
+  // Trying to seal it again.
+  result = client_.Seal(object_id);
+  ASSERT_TRUE(result.IsPlasmaObjectAlreadySealed());
+}
+
 TEST_F(TestPlasmaStore, DeleteTest) {
   ObjectID object_id = ObjectID::from_random();
 
   // Test for deleting non-existance object.
   Status result = client_.Delete(object_id);
-  ASSERT_EQ(result.IsPlasmaObjectNonexistent(), true);
+  ASSERT_TRUE(result.IsPlasmaObjectNonexistent());
 
   // Test for the object being in local Plasma store.
   // First create object.
@@ -108,7 +123,7 @@ TEST_F(TestPlasmaStore, DeleteTest) {
 
   // Object is in use, can't be delete.
   result = client_.Delete(object_id);
-  ASSERT_EQ(result.IsUnknownError(), true);
+  ASSERT_TRUE(result.IsUnknownError());
 
   // Avoid race condition of Plasma Manager waiting for notification.
   ARROW_CHECK_OK(client_.Release(object_id));
@@ -121,7 +136,7 @@ TEST_F(TestPlasmaStore, ContainsTest) {
   // Test for object non-existence.
   bool has_object;
   ARROW_CHECK_OK(client_.Contains(object_id, _object));
-  ASSERT_EQ(has_object, false);
+  ASSERT_FALSE(has_object);
 
   // Test for the object being in local Plasma store.
   // First create object.
@@ -131,7 +146,7 @@ TEST_F(TestPlasmaStore, ContainsTest) {
   std::vector object_buffers;
   ARROW_CHECK_OK(client_.Get({object_id}, -1, _buffers));
   ARROW_CHECK_OK(client_.Contains(object_id, _object));
-  

[jira] [Resolved] (ARROW-2494) Return status codes from PlasmaClient::Seal

2018-04-23 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2494.
---
   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1932
[https://github.com/apache/arrow/pull/1932]

> Return status codes from PlasmaClient::Seal
> ---
>
> Key: ARROW-2494
> URL: https://issues.apache.org/jira/browse/ARROW-2494
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2074) [Python] Allow type inference for struct arrays

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448318#comment-16448318
 ] 

ASF GitHub Bot commented on ARROW-2074:
---

pitrou commented on issue #1935: ARROW-2074: [Python] Infer lists of dicts as 
struct arrays
URL: https://github.com/apache/arrow/pull/1935#issuecomment-383618686
 
 
   Benchmark numbers here:
   * before:
   ```
   [100.00%] ··· Running convert_builtins.InferPyListToArray.time_infer 
ok
   [100.00%]  
   =
  type  
   -
 int64   11.0±0.1ms 
float64 10.3±0.07ms 
  bool  9.37±0.04ms 
decimal  297±0.9ms  
 binary  14.9±0.2ms 
 ascii   17.3±0.3ms 
unicode  29.7±0.8ms 
   int64 list96.8±0.6ms 
   =
   ```
   * after:
   ```
   [100.00%] ··· Running convert_builtins.InferPyListToArray.time_infer 
ok
   [100.00%]  
   =
  type  
   -
 int64   7.41±0.2ms 
float64 6.68±0.04ms 
  bool  5.75±0.01ms 
decimal  292±0.8ms  
 binary  11.4±0.2ms 
 ascii   14.1±0.3ms 
unicode  26.3±0.7ms 
   int64 list74.8±0.6ms 
 struct   70.7±4ms  
   =
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Allow type inference for struct arrays
> ---
>
> Key: ARROW-2074
> URL: https://issues.apache.org/jira/browse/ARROW-2074
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
>
> Support inferring a struct type in a {{pa.array}} call, if a sequence of 
> dicts (or dict of sequences?= is given. Of course, this could mean that the 
> wrong field order may be inferred, though on Python 3.6+ dicts retain 
> ordering until the first deletion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2497) Use ASSERT_NO_FATAIL_FAILURE in C++ unit tests

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2497:
--
Labels: pull-request-available  (was: )

> Use ASSERT_NO_FATAIL_FAILURE in C++ unit tests
> --
>
> Key: ARROW-2497
> URL: https://issues.apache.org/jira/browse/ARROW-2497
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Joshua Storck
>Priority: Minor
>  Labels: pull-request-available
>
> A number of unit tests have helper functions that use gtest/arrow ASSERT_ 
> macros. Those ASSERT_ macros simply return out of the current context and do 
> not throw exceptions or abort. Since these helper functions return void, the 
> unit test simply continues when the assertions are triggered. This can lead 
> to additional failures, such as segfaults because the test is executing code 
> that it did not expect to. By adding the gtest ASSERT_NO_FATAIL_FAILURE to 
> the calls of those helper functions in the outermost scope of the unit test, 
> the test will correctly terminate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2497) Use ASSERT_NO_FATAIL_FAILURE in C++ unit tests

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448292#comment-16448292
 ] 

ASF GitHub Bot commented on ARROW-2497:
---

joshuastorck commented on issue #458: ARROW-2497: [C++] Adding use of 
ASSERT_NO_FATAL_FAILURE in unit tests when calling helper functions that call 
ASSERT_ macros
URL: https://github.com/apache/parquet-cpp/pull/458#issuecomment-383612464
 
 
   Created a JIRA and updated this PR's title to associate it as a fix.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use ASSERT_NO_FATAIL_FAILURE in C++ unit tests
> --
>
> Key: ARROW-2497
> URL: https://issues.apache.org/jira/browse/ARROW-2497
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Joshua Storck
>Priority: Minor
>  Labels: pull-request-available
>
> A number of unit tests have helper functions that use gtest/arrow ASSERT_ 
> macros. Those ASSERT_ macros simply return out of the current context and do 
> not throw exceptions or abort. Since these helper functions return void, the 
> unit test simply continues when the assertions are triggered. This can lead 
> to additional failures, such as segfaults because the test is executing code 
> that it did not expect to. By adding the gtest ASSERT_NO_FATAIL_FAILURE to 
> the calls of those helper functions in the outermost scope of the unit test, 
> the test will correctly terminate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2497) Use ASSERT_NO_FATAIL_FAILURE in C++ unit tests

2018-04-23 Thread Joshua Storck (JIRA)
Joshua Storck created ARROW-2497:


 Summary: Use ASSERT_NO_FATAIL_FAILURE in C++ unit tests
 Key: ARROW-2497
 URL: https://issues.apache.org/jira/browse/ARROW-2497
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Joshua Storck


A number of unit tests have helper functions that use gtest/arrow ASSERT_ 
macros. Those ASSERT_ macros simply return out of the current context and do 
not throw exceptions or abort. Since these helper functions return void, the 
unit test simply continues when the assertions are triggered. This can lead to 
additional failures, such as segfaults because the test is executing code that 
it did not expect to. By adding the gtest ASSERT_NO_FATAIL_FAILURE to the calls 
of those helper functions in the outermost scope of the unit test, the test 
will correctly terminate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2074) [Python] Allow type inference for struct arrays

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2074:
--
Labels: pull-request-available  (was: )

> [Python] Allow type inference for struct arrays
> ---
>
> Key: ARROW-2074
> URL: https://issues.apache.org/jira/browse/ARROW-2074
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
>
> Support inferring a struct type in a {{pa.array}} call, if a sequence of 
> dicts (or dict of sequences?= is given. Of course, this could mean that the 
> wrong field order may be inferred, though on Python 3.6+ dicts retain 
> ordering until the first deletion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2074) [Python] Allow type inference for struct arrays

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448280#comment-16448280
 ] 

ASF GitHub Bot commented on ARROW-2074:
---

pitrou opened a new pull request #1935: ARROW-2074: [Python] Infer lists of 
dicts as struct arrays
URL: https://github.com/apache/arrow/pull/1935
 
 
   Also refactor the type inference visitor and remove the superfluous separate 
SeqVisitor.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Allow type inference for struct arrays
> ---
>
> Key: ARROW-2074
> URL: https://issues.apache.org/jira/browse/ARROW-2074
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
>
> Support inferring a struct type in a {{pa.array}} call, if a sequence of 
> dicts (or dict of sequences?= is given. Of course, this could mean that the 
> wrong field order may be inferred, though on Python 3.6+ dicts retain 
> ordering until the first deletion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2496) [C++] Add support for Libhdfs++

2018-04-23 Thread James Clampffer (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448260#comment-16448260
 ] 

James Clampffer commented on ARROW-2496:


HDFS-8707 has been merged into hadoop trunk.  I've been holding off on 
resolving it since I want to finish a few subtasks and haven't wanted to spam 
the hdfs-dev list by moving them all out into independent tasks.

> [C++] Add support for Libhdfs++
> ---
>
> Key: ARROW-2496
> URL: https://issues.apache.org/jira/browse/ARROW-2496
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Deepak Majeti
>Assignee: Deepak Majeti
>Priority: Major
>
> Libhdfs++ is an asynchronous pure C++ HDFS client. It is now part of the HDFS 
> project. Details are available here.
> https://issues.apache.org/jira/browse/HDFS-8707
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-2496) [C++] Add support for Libhdfs++

2018-04-23 Thread Deepak Majeti (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448207#comment-16448207
 ] 

Deepak Majeti edited comment on ARROW-2496 at 4/23/18 2:19 PM:
---

CC [~James C], who is one of the main developers of Libhdfs++


was (Author: mdeepak):
CC [~James C]

> [C++] Add support for Libhdfs++
> ---
>
> Key: ARROW-2496
> URL: https://issues.apache.org/jira/browse/ARROW-2496
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Deepak Majeti
>Assignee: Deepak Majeti
>Priority: Major
>
> Libhdfs++ is an asynchronous pure C++ HDFS client. It is now part of the HDFS 
> project. Details are available here.
> https://issues.apache.org/jira/browse/HDFS-8707
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2496) [C++] Add support for Libhdfs++

2018-04-23 Thread Deepak Majeti (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448207#comment-16448207
 ] 

Deepak Majeti commented on ARROW-2496:
--

CC [~James C]

> [C++] Add support for Libhdfs++
> ---
>
> Key: ARROW-2496
> URL: https://issues.apache.org/jira/browse/ARROW-2496
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Deepak Majeti
>Assignee: Deepak Majeti
>Priority: Major
>
> Libhdfs++ is an asynchronous pure C++ HDFS client. It is now part of the HDFS 
> project. Details are available here.
> https://issues.apache.org/jira/browse/HDFS-8707
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2496) [C++] Add support for Libhdfs++

2018-04-23 Thread Deepak Majeti (JIRA)
Deepak Majeti created ARROW-2496:


 Summary: [C++] Add support for Libhdfs++
 Key: ARROW-2496
 URL: https://issues.apache.org/jira/browse/ARROW-2496
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Deepak Majeti
Assignee: Deepak Majeti


Libhdfs++ is an asynchronous pure C++ HDFS client. It is now part of the HDFS 
project. Details are available here.

https://issues.apache.org/jira/browse/HDFS-8707

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2427) [C++] ReadAt implementations suboptimal

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448088#comment-16448088
 ] 

ASF GitHub Bot commented on ARROW-2427:
---

pitrou closed pull request #1867: ARROW-2427: [C++] Implement ReadAt properly
URL: https://github.com/apache/arrow/pull/1867
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/cpp/src/arrow/io/file.cc b/cpp/src/arrow/io/file.cc
index ba012beb7..e3d6f84f3 100644
--- a/cpp/src/arrow/io/file.cc
+++ b/cpp/src/arrow/io/file.cc
@@ -125,9 +125,8 @@ class OSFile {
   }
 
   Status ReadAt(int64_t position, int64_t nbytes, int64_t* bytes_read, void* 
out) {
-std::lock_guard guard(lock_);
-RETURN_NOT_OK(Seek(position));
-return Read(nbytes, bytes_read, out);
+return internal::FileReadAt(fd_, reinterpret_cast(out), 
position, nbytes,
+bytes_read);
   }
 
   Status Seek(int64_t pos) {
@@ -203,6 +202,19 @@ class ReadableFile::ReadableFileImpl : public OSFile {
 return Status::OK();
   }
 
+  Status ReadBufferAt(int64_t position, int64_t nbytes, 
std::shared_ptr* out) {
+std::shared_ptr buffer;
+RETURN_NOT_OK(AllocateResizableBuffer(pool_, nbytes, ));
+
+int64_t bytes_read = 0;
+RETURN_NOT_OK(ReadAt(position, nbytes, _read, 
buffer->mutable_data()));
+if (bytes_read < nbytes) {
+  RETURN_NOT_OK(buffer->Resize(bytes_read));
+}
+*out = buffer;
+return Status::OK();
+  }
+
  private:
   MemoryPool* pool_;
 };
@@ -247,9 +259,7 @@ Status ReadableFile::ReadAt(int64_t position, int64_t 
nbytes, int64_t* bytes_rea
 
 Status ReadableFile::ReadAt(int64_t position, int64_t nbytes,
 std::shared_ptr* out) {
-  std::lock_guard guard(impl_->lock());
-  RETURN_NOT_OK(Seek(position));
-  return impl_->ReadBuffer(nbytes, out);
+  return impl_->ReadBufferAt(position, nbytes, out);
 }
 
 Status ReadableFile::Read(int64_t nbytes, std::shared_ptr* out) {
@@ -459,42 +469,38 @@ Status MemoryMappedFile::Close() {
   return Status::OK();
 }
 
-Status MemoryMappedFile::Read(int64_t nbytes, int64_t* bytes_read, void* out) {
-  nbytes = std::max(
-  0, std::min(nbytes, memory_map_->size() - memory_map_->position()));
+Status MemoryMappedFile::ReadAt(int64_t position, int64_t nbytes, int64_t* 
bytes_read,
+void* out) {
+  nbytes = std::max(0, std::min(nbytes, memory_map_->size() - 
position));
   if (nbytes > 0) {
-std::memcpy(out, memory_map_->head(), static_cast(nbytes));
+std::memcpy(out, memory_map_->data() + position, 
static_cast(nbytes));
   }
   *bytes_read = nbytes;
-  memory_map_->advance(nbytes);
   return Status::OK();
 }
 
-Status MemoryMappedFile::Read(int64_t nbytes, std::shared_ptr* out) {
-  nbytes = std::max(
-  0, std::min(nbytes, memory_map_->size() - memory_map_->position()));
+Status MemoryMappedFile::ReadAt(int64_t position, int64_t nbytes,
+std::shared_ptr* out) {
+  nbytes = std::max(0, std::min(nbytes, memory_map_->size() - 
position));
 
   if (nbytes > 0) {
-*out = SliceBuffer(memory_map_, memory_map_->position(), nbytes);
+*out = SliceBuffer(memory_map_, position, nbytes);
   } else {
 *out = std::make_shared(nullptr, 0);
   }
-  memory_map_->advance(nbytes);
   return Status::OK();
 }
 
-Status MemoryMappedFile::ReadAt(int64_t position, int64_t nbytes, int64_t* 
bytes_read,
-void* out) {
-  std::lock_guard guard(memory_map_->lock());
-  RETURN_NOT_OK(Seek(position));
-  return Read(nbytes, bytes_read, out);
+Status MemoryMappedFile::Read(int64_t nbytes, int64_t* bytes_read, void* out) {
+  RETURN_NOT_OK(ReadAt(memory_map_->position(), nbytes, bytes_read, out));
+  memory_map_->advance(*bytes_read);
+  return Status::OK();
 }
 
-Status MemoryMappedFile::ReadAt(int64_t position, int64_t nbytes,
-std::shared_ptr* out) {
-  std::lock_guard guard(memory_map_->lock());
-  RETURN_NOT_OK(Seek(position));
-  return Read(nbytes, out);
+Status MemoryMappedFile::Read(int64_t nbytes, std::shared_ptr* out) {
+  RETURN_NOT_OK(ReadAt(memory_map_->position(), nbytes, out));
+  memory_map_->advance((*out)->size());
+  return Status::OK();
 }
 
 bool MemoryMappedFile::supports_zero_copy() const { return true; }
diff --git a/cpp/src/arrow/io/interfaces.h b/cpp/src/arrow/io/interfaces.h
index 09536a44e..743621c46 100644
--- a/cpp/src/arrow/io/interfaces.h
+++ b/cpp/src/arrow/io/interfaces.h
@@ -128,7 +128,8 @@ class ARROW_EXPORT RandomAccessFile : public InputStream, 
public Seekable {
   virtual bool supports_zero_copy() const = 0;
 
   /// \brief Read nbytes at position, provide default 

[jira] [Assigned] (ARROW-2427) [C++] ReadAt implementations suboptimal

2018-04-23 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-2427:
-

Assignee: Antoine Pitrou

> [C++] ReadAt implementations suboptimal
> ---
>
> Key: ARROW-2427
> URL: https://issues.apache.org/jira/browse/ARROW-2427
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> The {{ReadAt}} implementations for at least {{OSFile}} and 
> {{MemoryMappedFile}} take the file lock and seek. They could instead read 
> directly from the given offset, allowing concurrent I/O from multiple threads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2427) [C++] ReadAt implementations suboptimal

2018-04-23 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2427.
---
   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1867
[https://github.com/apache/arrow/pull/1867]

> [C++] ReadAt implementations suboptimal
> ---
>
> Key: ARROW-2427
> URL: https://issues.apache.org/jira/browse/ARROW-2427
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> The {{ReadAt}} implementations for at least {{OSFile}} and 
> {{MemoryMappedFile}} take the file lock and seek. They could instead read 
> directly from the given offset, allowing concurrent I/O from multiple threads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-1388) [Python] Add Table.drop method for removing columns

2018-04-23 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-1388.
---
Resolution: Fixed

> [Python] Add Table.drop method for removing columns
> ---
>
> Key: ARROW-1388
> URL: https://issues.apache.org/jira/browse/ARROW-1388
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: beginner
> Fix For: 0.10.0
>
>
> See ARROW-1374 for a use case. This function should take as an input a list 
> of columns and return a new Table instance without them.
> A well-defined interface for this implementation can be found in Pandas: 
> https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1731) [Python] Provide for selecting a subset of columns to convert in RecordBatch/Table.from_pandas

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448084#comment-16448084
 ] 

ASF GitHub Bot commented on ARROW-1731:
---

pitrou commented on issue #1924: ARROW-1731: [Python] Add columns selector in 
Table.from_array
URL: https://github.com/apache/arrow/pull/1924#issuecomment-383562569
 
 
   @samuelsinayoko, do you have a JIRA id so that the issue can be assigned to 
you?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Provide for selecting a subset of columns to convert in 
> RecordBatch/Table.from_pandas
> --
>
> Key: ARROW-1731
> URL: https://issues.apache.org/jira/browse/ARROW-1731
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: beginner, pull-request-available
> Fix For: 0.10.0
>
>
> Currently it's all-or-nothing, and to do the subsetting in pandas incurs a 
> data copy. This would enable columns (by name or index) to be selected out 
> without additional data copying. We should add a {{columns=}} argument to the 
> the {{from_pandas}} calls and do the subsetting when we dispatch the 
> individual arrays for conversion to Arrow.
> cc [~cpcloud] [~jreback]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1731) [Python] Provide for selecting a subset of columns to convert in RecordBatch/Table.from_pandas

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448082#comment-16448082
 ] 

ASF GitHub Bot commented on ARROW-1731:
---

pitrou closed pull request #1924: ARROW-1731: [Python] Add columns selector in 
Table.from_array
URL: https://github.com/apache/arrow/pull/1924
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/python/pyarrow/pandas_compat.py b/python/pyarrow/pandas_compat.py
index 24da7444f..c288c7f03 100644
--- a/python/pyarrow/pandas_compat.py
+++ b/python/pyarrow/pandas_compat.py
@@ -316,7 +316,9 @@ def _index_level_name(index, i, column_names):
 return '__index_level_{:d}__'.format(i)
 
 
-def dataframe_to_arrays(df, schema, preserve_index, nthreads=1):
+def dataframe_to_arrays(df, schema, preserve_index, nthreads=1, columns=None):
+if columns is None:
+columns = df.columns
 column_names = []
 index_columns = []
 index_column_names = []
@@ -334,7 +336,7 @@ def dataframe_to_arrays(df, schema, preserve_index, 
nthreads=1):
 'Duplicate column names found: {}'.format(list(df.columns))
 )
 
-for name in df.columns:
+for name in columns:
 col = df[name]
 name = _column_name_to_strings(name)
 
diff --git a/python/pyarrow/table.pxi b/python/pyarrow/table.pxi
index cbf2a69f7..7951e1946 100644
--- a/python/pyarrow/table.pxi
+++ b/python/pyarrow/table.pxi
@@ -726,7 +726,7 @@ cdef class RecordBatch:
 
 @classmethod
 def from_pandas(cls, df, Schema schema=None, bint preserve_index=True,
-nthreads=None):
+nthreads=None, columns=None):
 """
 Convert pandas.DataFrame to an Arrow RecordBatch
 
@@ -742,13 +742,15 @@ cdef class RecordBatch:
 nthreads : int, default None (may use up to system CPU count threads)
 If greater than 1, convert columns to Arrow in parallel using
 indicated number of threads
+columns : list, optional
+   List of column to be converted. If None, use all columns.
 
 Returns
 ---
 pyarrow.RecordBatch
 """
 names, arrays, metadata = pdcompat.dataframe_to_arrays(
-df, schema, preserve_index, nthreads=nthreads
+df, schema, preserve_index, nthreads=nthreads, columns=columns
 )
 return cls.from_arrays(arrays, names, metadata)
 
@@ -892,7 +894,7 @@ cdef class Table:
 
 @classmethod
 def from_pandas(cls, df, Schema schema=None, bint preserve_index=True,
-nthreads=None):
+nthreads=None, columns=None):
 """
 Convert pandas.DataFrame to an Arrow Table
 
@@ -908,6 +910,9 @@ cdef class Table:
 nthreads : int, default None (may use up to system CPU count threads)
 If greater than 1, convert columns to Arrow in parallel using
 indicated number of threads
+columns : list, optional
+   List of column to be converted. If None, use all columns.
+
 
 Returns
 ---
@@ -929,7 +934,8 @@ cdef class Table:
 df,
 schema=schema,
 preserve_index=preserve_index,
-nthreads=nthreads
+nthreads=nthreads,
+columns=columns
 )
 return cls.from_arrays(arrays, names=names, metadata=metadata)
 
@@ -1263,6 +1269,30 @@ cdef class Table:
 
 return pyarrow_wrap_table(c_table)
 
+def drop(self, columns):
+"""
+Drop one or more columns and return a new table.
+
+columns: list of str
+
+Returns pa.Table
+"""
+indices = []
+for col in columns:
+idx = self.schema.get_field_index(col)
+if idx == -1:
+raise KeyError("Column {!r} not found".format(col))
+indices.append(idx)
+
+indices.sort()
+indices.reverse()
+
+table = self
+for idx in indices:
+table = table.remove_column(idx)
+
+return table
+
 
 def concat_tables(tables):
 """
diff --git a/python/pyarrow/tests/test_convert_pandas.py 
b/python/pyarrow/tests/test_convert_pandas.py
index 6970975cc..ba5b88574 100644
--- a/python/pyarrow/tests/test_convert_pandas.py
+++ b/python/pyarrow/tests/test_convert_pandas.py
@@ -133,6 +133,17 @@ def test_non_string_columns(self):
 table = pa.Table.from_pandas(df)
 assert table.column(0).name == '0'
 
+def test_from_pandas_with_columns(self):
+df = pd.DataFrame({0: [1, 2, 3], 1: [1, 3, 3], 2: [2, 4, 5]})
+
+table = pa.Table.from_pandas(df, columns=[0, 1])
+expected = pa.Table.from_pandas(df[[0, 1]])
+  

[jira] [Resolved] (ARROW-1731) [Python] Provide for selecting a subset of columns to convert in RecordBatch/Table.from_pandas

2018-04-23 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-1731.
---
Resolution: Fixed

Issue resolved by pull request 1924
[https://github.com/apache/arrow/pull/1924]

> [Python] Provide for selecting a subset of columns to convert in 
> RecordBatch/Table.from_pandas
> --
>
> Key: ARROW-1731
> URL: https://issues.apache.org/jira/browse/ARROW-1731
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: beginner, pull-request-available
> Fix For: 0.10.0
>
>
> Currently it's all-or-nothing, and to do the subsetting in pandas incurs a 
> data copy. This would enable columns (by name or index) to be selected out 
> without additional data copying. We should add a {{columns=}} argument to the 
> the {{from_pandas}} calls and do the subsetting when we dispatch the 
> individual arrays for conversion to Arrow.
> cc [~cpcloud] [~jreback]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2495) [Plasma] Pretty print plasma objects

2018-04-23 Thread Krisztian Szucs (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448066#comment-16448066
 ] 

Krisztian Szucs commented on ARROW-2495:


Sure, I don't want to introduce a dependency for this ticket, just asking in 
general.

> [Plasma] Pretty print plasma objects
> 
>
> Key: ARROW-2495
> URL: https://issues.apache.org/jira/browse/ARROW-2495
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Krisztian Szucs
>Priority: Minor
>
> The implementation should based on 
> [arrow::pretty_print|https://github.com/apache/arrow/blob/master/cpp/src/arrow/pretty_print.h].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2489) [Plasma] test_plasma.py crashes

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448061#comment-16448061
 ] 

ASF GitHub Bot commented on ARROW-2489:
---

pitrou commented on issue #1933: ARROW-2489: [Plasma] Fix PlasmaClient ABI 
variation
URL: https://github.com/apache/arrow/pull/1933#issuecomment-383559208
 
 
   I think the AppVeyor failure is unrelated. It also succeeded on my AppVeyor 
account: https://ci.appveyor.com/project/pitrou/arrow/build/1.0.331


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Plasma] test_plasma.py crashes
> ---
>
> Key: ARROW-2489
> URL: https://issues.apache.org/jira/browse/ARROW-2489
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GPU, Plasma (C++), Python
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> This is new here:
> {code}$ py.test   --tb=native pyarrow/tests/test_plasma.py 
> = test 
> session starts 
> ==
> platform linux -- Python 3.6.5, pytest-3.3.2, py-1.5.2, pluggy-0.6.0
> rootdir: /home/antoine/arrow/python, inifile: setup.cfg
> plugins: xdist-1.22.0, timeout-1.2.1, repeat-0.4.1, forked-0.2, 
> faulthandler-1.3.1
> collected 23 items
>   
>
> pyarrow/tests/test_plasma.py *** Error in 
> `/home/antoine/miniconda3/envs/pyarrow/bin/python': double free or corruption 
> (!prev): 0x01699520 ***
> [...]
> Current thread 0x7fe7e8570700 (most recent call first):
>   File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 211 in 
> test_connection_failure_raises_exception
> [...]
> {code}
> Here is the C backtrace under gdb:
> {code}
> #0  0x769d0428 in __GI_raise (sig=sig@entry=6) at 
> ../sysdeps/unix/sysv/linux/raise.c:54
> #1  0x769d202a in __GI_abort () at abort.c:89
> #2  0x76a127ea in __libc_message (do_abort=do_abort@entry=2, 
> fmt=fmt@entry=0x76b2bed8 "*** Error in `%s': %s: 0x%s ***\n")
> at ../sysdeps/posix/libc_fatal.c:175
> #3  0x76a1b37a in malloc_printerr (ar_ptr=, 
> ptr=, str=0x76b2c008 "double free or corruption (!prev)", 
> action=3)
> at malloc.c:5006
> #4  _int_free (av=, p=, have_lock=0) at 
> malloc.c:3867
> #5  0x76a1f53c in __GI___libc_free (mem=) at 
> malloc.c:2968
> #6  0x7fffbdfcc504 in std::_Sp_counted_ptr (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x9defb0)
> at /usr/include/c++/4.9/bits/shared_ptr_base.h:373
> #7  0x7fffbdfc903c in 
> std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x9defb0) 
> at /usr/include/c++/4.9/bits/shared_ptr_base.h:149
> #8  0x7fffbdfc82b9 in 
> std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count 
> (this=0x7fffc1214510, __in_chrg=)
> at /usr/include/c++/4.9/bits/shared_ptr_base.h:666
> #9  0x7fffbdfc8276 in std::__shared_ptr (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7fffc1214508, 
> __in_chrg=)
> at /usr/include/c++/4.9/bits/shared_ptr_base.h:914
> #10 0x7fffbdfc8fc4 in std::shared_ptr::~shared_ptr 
> (this=0x7fffc1214508, __in_chrg=)
> at /usr/include/c++/4.9/bits/shared_ptr.h:93
> #11 0x7fffbdfc8fde in 
> __Pyx_call_destructor (x=...)
> at /home/antoine/arrow/python/build/temp.linux-x86_64-3.6/plasma.cxx:281
> #12 0x7fffbdfbc317 in __pyx_tp_dealloc_7pyarrow_6plasma_PlasmaClient 
> (o=0x7fffc12144f0)
> at /home/antoine/arrow/python/build/temp.linux-x86_64-3.6/plasma.cxx:10383
> #13 0x7fffbdfb8986 in __pyx_pf_7pyarrow_6plasma_2connect (__pyx_self=0x0, 
> __pyx_v_store_socket_name=0x7fffbc922c48, 
> __pyx_v_manager_socket_name=0x77fa0ab0, __pyx_v_release_delay=0, 
> __pyx_v_num_retries=1)
> at /home/antoine/arrow/python/build/temp.linux-x86_64-3.6/plasma.cxx:9147
> #14 0x7fffbdfb7dec in __pyx_pw_7pyarrow_6plasma_3connect (__pyx_self=0x0, 
> __pyx_args=0x7fffbc4d9688, __pyx_kwds=0x0)
> at /home/antoine/arrow/python/build/temp.linux-x86_64-3.6/plasma.cxx:8978
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2495) [Plasma] Pretty print plasma objects

2018-04-23 Thread Antoine Pitrou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448059#comment-16448059
 ] 

Antoine Pitrou commented on ARROW-2495:
---

fmtlib looks pretty cool, but I honestly don't know whether it's a good idea to 
depend on it or not. In this particular case, we're just talking about 
outputting a 20-byte bytestring, so we can probably do without it :-)

By the way, I don't need this needs to be based on {{arrow::pretty_print}} 
either, you could just implement a {{<<}} operator.

> [Plasma] Pretty print plasma objects
> 
>
> Key: ARROW-2495
> URL: https://issues.apache.org/jira/browse/ARROW-2495
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Krisztian Szucs
>Priority: Minor
>
> The implementation should based on 
> [arrow::pretty_print|https://github.com/apache/arrow/blob/master/cpp/src/arrow/pretty_print.h].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2470) [C++] FileGetSize() should not seek

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448050#comment-16448050
 ] 

ASF GitHub Bot commented on ARROW-2470:
---

pitrou opened a new pull request #1934: ARROW-2470: [C++] Avoid seeking in 
GetFileSize
URL: https://github.com/apache/arrow/pull/1934
 
 
   This makes GetFileSize thread-safe and also reduces its cost.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] FileGetSize() should not seek
> ---
>
> Key: ARROW-2470
> URL: https://issues.apache.org/jira/browse/ARROW-2470
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> {{FileGetSize()}} currently seeks to the end of file and reads the current 
> file position. Instead it should simply call {{fstat}} on the file descriptor 
> (or the Windows equivalent).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2470) [C++] FileGetSize() should not seek

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2470:
--
Labels: pull-request-available  (was: )

> [C++] FileGetSize() should not seek
> ---
>
> Key: ARROW-2470
> URL: https://issues.apache.org/jira/browse/ARROW-2470
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> {{FileGetSize()}} currently seeks to the end of file and reads the current 
> file position. Instead it should simply call {{fstat}} on the file descriptor 
> (or the Windows equivalent).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2495) [Plasma] Pretty print plasma objects

2018-04-23 Thread Krisztian Szucs (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448003#comment-16448003
 ] 

Krisztian Szucs commented on ARROW-2495:


BTW [~pitrou] Have You considered to use a formatting library like 
https://github.com/fmtlib/fmt ?



> [Plasma] Pretty print plasma objects
> 
>
> Key: ARROW-2495
> URL: https://issues.apache.org/jira/browse/ARROW-2495
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Krisztian Szucs
>Priority: Minor
>
> The implementation should based on 
> [arrow::pretty_print|https://github.com/apache/arrow/blob/master/cpp/src/arrow/pretty_print.h].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2495) [Plasma] Pretty print plasma objects

2018-04-23 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-2495:
--

 Summary: [Plasma] Pretty print plasma objects
 Key: ARROW-2495
 URL: https://issues.apache.org/jira/browse/ARROW-2495
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Krisztian Szucs


The implementation should based on 
[arrow::pretty_print|https://github.com/apache/arrow/blob/master/cpp/src/arrow/pretty_print.h].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2494) Return status codes from PlasmaClient::Seal

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447992#comment-16447992
 ] 

ASF GitHub Bot commented on ARROW-2494:
---

pitrou commented on a change in pull request #1932: ARROW-2494: [C++] Return 
status codes from PlasmaClient::Seal instead of crashing
URL: https://github.com/apache/arrow/pull/1932#discussion_r183360950
 
 

 ##
 File path: cpp/src/plasma/client.cc
 ##
 @@ -604,10 +604,16 @@ Status PlasmaClient::Seal(const ObjectID& object_id) {
   // Make sure this client has a reference to the object before sending the
   // request to Plasma.
   auto object_entry = objects_in_use_.find(object_id);
-  ARROW_CHECK(object_entry != objects_in_use_.end())
-  << "Plasma client called seal an object without a reference to it";
-  ARROW_CHECK(!object_entry->second->is_sealed)
-  << "Plasma client called seal an already sealed object";
+
+  if (object_entry == objects_in_use_.end()) {
+return Status::PlasmaObjectNonexistent(
+"Plasma client called seal an object without a reference to it");
 
 Review comment:
   I'm not sure, but that sounds reasonable, and can be discussed on a JIRA 
issue indeed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Return status codes from PlasmaClient::Seal
> ---
>
> Key: ARROW-2494
> URL: https://issues.apache.org/jira/browse/ARROW-2494
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2494) Return status codes from PlasmaClient::Seal

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447983#comment-16447983
 ] 

ASF GitHub Bot commented on ARROW-2494:
---

kszucs commented on a change in pull request #1932: ARROW-2494: [C++] Return 
status codes from PlasmaClient::Seal instead of crashing
URL: https://github.com/apache/arrow/pull/1932#discussion_r183360587
 
 

 ##
 File path: cpp/src/plasma/client.cc
 ##
 @@ -604,10 +604,16 @@ Status PlasmaClient::Seal(const ObjectID& object_id) {
   // Make sure this client has a reference to the object before sending the
   // request to Plasma.
   auto object_entry = objects_in_use_.find(object_id);
-  ARROW_CHECK(object_entry != objects_in_use_.end())
-  << "Plasma client called seal an object without a reference to it";
-  ARROW_CHECK(!object_entry->second->is_sealed)
-  << "Plasma client called seal an already sealed object";
+
+  if (object_entry == objects_in_use_.end()) {
+return Status::PlasmaObjectNonexistent(
+"Plasma client called seal an object without a reference to it");
 
 Review comment:
   @pitrou I guess pretty printing of plasma object should follow the API of 
[arrow::pretty_print](https://github.com/apache/arrow/blob/master/cpp/src/arrow/pretty_print.h),
 if so should I create a JIRA ticket instead?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Return status codes from PlasmaClient::Seal
> ---
>
> Key: ARROW-2494
> URL: https://issues.apache.org/jira/browse/ARROW-2494
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2422) Support more filter operators on Hive partitioned Parquet files

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447956#comment-16447956
 ] 

ASF GitHub Bot commented on ARROW-2422:
---

jneuff commented on a change in pull request #1861: ARROW-2422: Support more 
operators for partition filtering
URL: https://github.com/apache/arrow/pull/1861#discussion_r183355089
 
 

 ##
 File path: python/pyarrow/parquet.py
 ##
 @@ -864,12 +864,23 @@ def filter_accepts_partition(part_key, filter, level):
 if p_column != f_column:
 return True
 
-f_value_index = self.partitions.get_index(level, p_column,
-  str(f_value))
+p_value = (self.partitions
+   .levels[level]
+   .dictionary[p_value_index]
+   .as_py())
+
 if op == "=":
-return f_value_index == p_value_index
+return p_value == f_value
 elif op == "!=":
-return f_value_index != p_value_index
+return p_value != f_value
+elif op == '<':
+return p_value < f_value
+elif op == '>':
+return p_value > f_value
+elif op == '<=':
+return p_value <= f_value
+elif op == '>=':
+return p_value >= f_value
 else:
 return True
 
 Review comment:
   I take that as a yes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support more filter operators on Hive partitioned Parquet files
> ---
>
> Key: ARROW-2422
> URL: https://issues.apache.org/jira/browse/ARROW-2422
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Julius Neuffer
>Priority: Minor
>  Labels: features, pull-request-available
>
> After implementing basic filters ('=', '!=') on Hive partitioned Parquet 
> files (ARROW-2401), I'll extend them ('>', '<', '<=', '>=') with a new PR on 
> Github.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2422) Support more filter operators on Hive partitioned Parquet files

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447953#comment-16447953
 ] 

ASF GitHub Bot commented on ARROW-2422:
---

jneuff commented on a change in pull request #1861: ARROW-2422: Support more 
operators for partition filtering
URL: https://github.com/apache/arrow/pull/1861#discussion_r183354699
 
 

 ##
 File path: python/pyarrow/parquet.py
 ##
 @@ -864,12 +864,23 @@ def filter_accepts_partition(part_key, filter, level):
 if p_column != f_column:
 return True
 
-f_value_index = self.partitions.get_index(level, p_column,
-  str(f_value))
+p_value = (self.partitions
+   .levels[level]
+   .dictionary[p_value_index]
+   .as_py())
+
 if op == "=":
 
 Review comment:
   Should we replace `"="` with `"=="`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support more filter operators on Hive partitioned Parquet files
> ---
>
> Key: ARROW-2422
> URL: https://issues.apache.org/jira/browse/ARROW-2422
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Julius Neuffer
>Priority: Minor
>  Labels: features, pull-request-available
>
> After implementing basic filters ('=', '!=') on Hive partitioned Parquet 
> files (ARROW-2401), I'll extend them ('>', '<', '<=', '>=') with a new PR on 
> Github.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2494) Return status codes from PlasmaClient::Seal

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447858#comment-16447858
 ] 

ASF GitHub Bot commented on ARROW-2494:
---

pitrou commented on issue #1932: ARROW-2494: [C++] Return status codes from 
PlasmaClient::Seal instead of crashing
URL: https://github.com/apache/arrow/pull/1932#issuecomment-383517652
 
 
   When `ARROW_CHECK` checks for an internal invariant, we can keep it. But in 
other cases (such as non-existent object ID or IO error), it would be better to 
replace it with an error return, yes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Return status codes from PlasmaClient::Seal
> ---
>
> Key: ARROW-2494
> URL: https://issues.apache.org/jira/browse/ARROW-2494
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2494) Return status codes from PlasmaClient::Seal

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447856#comment-16447856
 ] 

ASF GitHub Bot commented on ARROW-2494:
---

pitrou commented on a change in pull request #1932: ARROW-2494: [C++] Return 
status codes from PlasmaClient::Seal instead of crashing
URL: https://github.com/apache/arrow/pull/1932#discussion_r18281
 
 

 ##
 File path: cpp/src/plasma/client.cc
 ##
 @@ -604,10 +604,16 @@ Status PlasmaClient::Seal(const ObjectID& object_id) {
   // Make sure this client has a reference to the object before sending the
   // request to Plasma.
   auto object_entry = objects_in_use_.find(object_id);
-  ARROW_CHECK(object_entry != objects_in_use_.end())
-  << "Plasma client called seal an object without a reference to it";
-  ARROW_CHECK(!object_entry->second->is_sealed)
-  << "Plasma client called seal an already sealed object";
+
+  if (object_entry == objects_in_use_.end()) {
+return Status::PlasmaObjectNonexistent(
+"Plasma client called seal an object without a reference to it");
 
 Review comment:
   If there's a function to pretty-print object IDs then why not :-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Return status codes from PlasmaClient::Seal
> ---
>
> Key: ARROW-2494
> URL: https://issues.apache.org/jira/browse/ARROW-2494
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2494) Return status codes from PlasmaClient::Seal

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447855#comment-16447855
 ] 

ASF GitHub Bot commented on ARROW-2494:
---

pitrou commented on a change in pull request #1932: ARROW-2494: [C++] Return 
status codes from PlasmaClient::Seal instead of crashing
URL: https://github.com/apache/arrow/pull/1932#discussion_r18281
 
 

 ##
 File path: cpp/src/plasma/client.cc
 ##
 @@ -604,10 +604,16 @@ Status PlasmaClient::Seal(const ObjectID& object_id) {
   // Make sure this client has a reference to the object before sending the
   // request to Plasma.
   auto object_entry = objects_in_use_.find(object_id);
-  ARROW_CHECK(object_entry != objects_in_use_.end())
-  << "Plasma client called seal an object without a reference to it";
-  ARROW_CHECK(!object_entry->second->is_sealed)
-  << "Plasma client called seal an already sealed object";
+
+  if (object_entry == objects_in_use_.end()) {
+return Status::PlasmaObjectNonexistent(
+"Plasma client called seal an object without a reference to it");
 
 Review comment:
   If there's a function to pretty-print object IDs when why not :-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Return status codes from PlasmaClient::Seal
> ---
>
> Key: ARROW-2494
> URL: https://issues.apache.org/jira/browse/ARROW-2494
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2494) Return status codes from PlasmaClient::Seal

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447853#comment-16447853
 ] 

ASF GitHub Bot commented on ARROW-2494:
---

kszucs commented on issue #1932: ARROW-2494: [C++] Return status codes from 
PlasmaClient::Seal instead of crashing
URL: https://github.com/apache/arrow/pull/1932#issuecomment-383516365
 
 
   @pitrou Generally We should replace all occurrences of `ARROW_CHECK`, right? 
I've created a subtask in JIRA, should I do the same with the remaining ones?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Return status codes from PlasmaClient::Seal
> ---
>
> Key: ARROW-2494
> URL: https://issues.apache.org/jira/browse/ARROW-2494
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2494) Return status codes from PlasmaClient::Seal

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447852#comment-16447852
 ] 

ASF GitHub Bot commented on ARROW-2494:
---

kszucs commented on issue #1932: ARROW-2494: [C++] Return status codes from 
PlasmaClient::Seal instead of crashing
URL: https://github.com/apache/arrow/pull/1932#issuecomment-383516365
 
 
   @pitrou I've created a subtask in JIRA. Generally We should replace all 
occurrences of `ARROW_CHECK`, right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Return status codes from PlasmaClient::Seal
> ---
>
> Key: ARROW-2494
> URL: https://issues.apache.org/jira/browse/ARROW-2494
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2494) Return status codes from PlasmaClient::Seal

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447848#comment-16447848
 ] 

ASF GitHub Bot commented on ARROW-2494:
---

kszucs commented on a change in pull request #1932: ARROW-2494: [C++] Return 
status codes from PlasmaClient::Seal instead of crashing
URL: https://github.com/apache/arrow/pull/1932#discussion_r183331248
 
 

 ##
 File path: cpp/src/plasma/client.cc
 ##
 @@ -604,10 +604,16 @@ Status PlasmaClient::Seal(const ObjectID& object_id) {
   // Make sure this client has a reference to the object before sending the
   // request to Plasma.
   auto object_entry = objects_in_use_.find(object_id);
-  ARROW_CHECK(object_entry != objects_in_use_.end())
-  << "Plasma client called seal an object without a reference to it";
-  ARROW_CHECK(!object_entry->second->is_sealed)
-  << "Plasma client called seal an already sealed object";
+
+  if (object_entry == objects_in_use_.end()) {
+return Status::PlasmaObjectNonexistent(
+"Plasma client called seal an object without a reference to it");
 
 Review comment:
   Me neither :) Shouldn't we incorporate the object ID in the message?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Return status codes from PlasmaClient::Seal
> ---
>
> Key: ARROW-2494
> URL: https://issues.apache.org/jira/browse/ARROW-2494
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2494) Return status codes from PlasmaClient::Seal

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447827#comment-16447827
 ] 

ASF GitHub Bot commented on ARROW-2494:
---

pitrou commented on a change in pull request #1932: ARROW-2494: [C++] Return 
status codes from PlasmaClient::Seal instead of crashing
URL: https://github.com/apache/arrow/pull/1932#discussion_r183327690
 
 

 ##
 File path: cpp/src/plasma/test/client_tests.cc
 ##
 @@ -90,6 +90,21 @@ class TestPlasmaStore : public ::testing::Test {
   PlasmaClient client2_;
 };
 
+TEST_F(TestPlasmaStore, SealErrorsTest) {
+  ObjectID object_id = ObjectID::from_random();
+
+  Status result = client_.Seal(object_id);
+  ASSERT_EQ(result.IsPlasmaObjectNonexistent(), true);
 
 Review comment:
   You can use ASSERT_TRUE().


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Return status codes from PlasmaClient::Seal
> ---
>
> Key: ARROW-2494
> URL: https://issues.apache.org/jira/browse/ARROW-2494
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2494) Return status codes from PlasmaClient::Seal

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447826#comment-16447826
 ] 

ASF GitHub Bot commented on ARROW-2494:
---

pitrou commented on a change in pull request #1932: ARROW-2494: [C++] Return 
status codes from PlasmaClient::Seal instead of crashing
URL: https://github.com/apache/arrow/pull/1932#discussion_r183327354
 
 

 ##
 File path: cpp/src/plasma/client.cc
 ##
 @@ -604,10 +604,16 @@ Status PlasmaClient::Seal(const ObjectID& object_id) {
   // Make sure this client has a reference to the object before sending the
   // request to Plasma.
   auto object_entry = objects_in_use_.find(object_id);
-  ARROW_CHECK(object_entry != objects_in_use_.end())
-  << "Plasma client called seal an object without a reference to it";
-  ARROW_CHECK(!object_entry->second->is_sealed)
-  << "Plasma client called seal an already sealed object";
+
+  if (object_entry == objects_in_use_.end()) {
+return Status::PlasmaObjectNonexistent(
+"Plasma client called seal an object without a reference to it");
 
 Review comment:
   I'm not a native English speaker, but while we're at it, perhaps we could 
fix the grammar in this error message and the one below?
   Something like "Seal() called on an object without a reference to it".


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Return status codes from PlasmaClient::Seal
> ---
>
> Key: ARROW-2494
> URL: https://issues.apache.org/jira/browse/ARROW-2494
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2494) Return status codes from PlasmaClient::Seal

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447828#comment-16447828
 ] 

ASF GitHub Bot commented on ARROW-2494:
---

pitrou commented on a change in pull request #1932: ARROW-2494: [C++] Return 
status codes from PlasmaClient::Seal instead of crashing
URL: https://github.com/apache/arrow/pull/1932#discussion_r183327738
 
 

 ##
 File path: cpp/src/plasma/test/client_tests.cc
 ##
 @@ -90,6 +90,21 @@ class TestPlasmaStore : public ::testing::Test {
   PlasmaClient client2_;
 };
 
+TEST_F(TestPlasmaStore, SealErrorsTest) {
+  ObjectID object_id = ObjectID::from_random();
+
+  Status result = client_.Seal(object_id);
+  ASSERT_EQ(result.IsPlasmaObjectNonexistent(), true);
+
+  // Create object.
+  std::vector data(100, 0);
+  CreateObject(client_, object_id, {42}, data);
+
+  // Trying to seal it again.
+  result = client_.Seal(object_id);
+  ASSERT_EQ(result.IsPlasmaObjectAlreadySealed(), true);
 
 Review comment:
   Same here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Return status codes from PlasmaClient::Seal
> ---
>
> Key: ARROW-2494
> URL: https://issues.apache.org/jira/browse/ARROW-2494
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2489) [Plasma] test_plasma.py crashes

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447821#comment-16447821
 ] 

ASF GitHub Bot commented on ARROW-2489:
---

pitrou opened a new pull request #1933: ARROW-2489: [Plasma] Fix PlasmaClient 
ABI variation
URL: https://github.com/apache/arrow/pull/1933
 
 
   When compiled with GPU support, the PlasmaClient ABI would differ, leading 
to a crash in the Python bindings to Plasma.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Plasma] test_plasma.py crashes
> ---
>
> Key: ARROW-2489
> URL: https://issues.apache.org/jira/browse/ARROW-2489
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GPU, Plasma (C++), Python
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> This is new here:
> {code}$ py.test   --tb=native pyarrow/tests/test_plasma.py 
> = test 
> session starts 
> ==
> platform linux -- Python 3.6.5, pytest-3.3.2, py-1.5.2, pluggy-0.6.0
> rootdir: /home/antoine/arrow/python, inifile: setup.cfg
> plugins: xdist-1.22.0, timeout-1.2.1, repeat-0.4.1, forked-0.2, 
> faulthandler-1.3.1
> collected 23 items
>   
>
> pyarrow/tests/test_plasma.py *** Error in 
> `/home/antoine/miniconda3/envs/pyarrow/bin/python': double free or corruption 
> (!prev): 0x01699520 ***
> [...]
> Current thread 0x7fe7e8570700 (most recent call first):
>   File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 211 in 
> test_connection_failure_raises_exception
> [...]
> {code}
> Here is the C backtrace under gdb:
> {code}
> #0  0x769d0428 in __GI_raise (sig=sig@entry=6) at 
> ../sysdeps/unix/sysv/linux/raise.c:54
> #1  0x769d202a in __GI_abort () at abort.c:89
> #2  0x76a127ea in __libc_message (do_abort=do_abort@entry=2, 
> fmt=fmt@entry=0x76b2bed8 "*** Error in `%s': %s: 0x%s ***\n")
> at ../sysdeps/posix/libc_fatal.c:175
> #3  0x76a1b37a in malloc_printerr (ar_ptr=, 
> ptr=, str=0x76b2c008 "double free or corruption (!prev)", 
> action=3)
> at malloc.c:5006
> #4  _int_free (av=, p=, have_lock=0) at 
> malloc.c:3867
> #5  0x76a1f53c in __GI___libc_free (mem=) at 
> malloc.c:2968
> #6  0x7fffbdfcc504 in std::_Sp_counted_ptr (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x9defb0)
> at /usr/include/c++/4.9/bits/shared_ptr_base.h:373
> #7  0x7fffbdfc903c in 
> std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x9defb0) 
> at /usr/include/c++/4.9/bits/shared_ptr_base.h:149
> #8  0x7fffbdfc82b9 in 
> std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count 
> (this=0x7fffc1214510, __in_chrg=)
> at /usr/include/c++/4.9/bits/shared_ptr_base.h:666
> #9  0x7fffbdfc8276 in std::__shared_ptr (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7fffc1214508, 
> __in_chrg=)
> at /usr/include/c++/4.9/bits/shared_ptr_base.h:914
> #10 0x7fffbdfc8fc4 in std::shared_ptr::~shared_ptr 
> (this=0x7fffc1214508, __in_chrg=)
> at /usr/include/c++/4.9/bits/shared_ptr.h:93
> #11 0x7fffbdfc8fde in 
> __Pyx_call_destructor (x=...)
> at /home/antoine/arrow/python/build/temp.linux-x86_64-3.6/plasma.cxx:281
> #12 0x7fffbdfbc317 in __pyx_tp_dealloc_7pyarrow_6plasma_PlasmaClient 
> (o=0x7fffc12144f0)
> at /home/antoine/arrow/python/build/temp.linux-x86_64-3.6/plasma.cxx:10383
> #13 0x7fffbdfb8986 in __pyx_pf_7pyarrow_6plasma_2connect (__pyx_self=0x0, 
> __pyx_v_store_socket_name=0x7fffbc922c48, 
> __pyx_v_manager_socket_name=0x77fa0ab0, __pyx_v_release_delay=0, 
> __pyx_v_num_retries=1)
> at /home/antoine/arrow/python/build/temp.linux-x86_64-3.6/plasma.cxx:9147
> #14 0x7fffbdfb7dec in __pyx_pw_7pyarrow_6plasma_3connect (__pyx_self=0x0, 
> __pyx_args=0x7fffbc4d9688, __pyx_kwds=0x0)
> at /home/antoine/arrow/python/build/temp.linux-x86_64-3.6/plasma.cxx:8978
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2489) [Plasma] test_plasma.py crashes

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2489:
--
Labels: pull-request-available  (was: )

> [Plasma] test_plasma.py crashes
> ---
>
> Key: ARROW-2489
> URL: https://issues.apache.org/jira/browse/ARROW-2489
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GPU, Plasma (C++), Python
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> This is new here:
> {code}$ py.test   --tb=native pyarrow/tests/test_plasma.py 
> = test 
> session starts 
> ==
> platform linux -- Python 3.6.5, pytest-3.3.2, py-1.5.2, pluggy-0.6.0
> rootdir: /home/antoine/arrow/python, inifile: setup.cfg
> plugins: xdist-1.22.0, timeout-1.2.1, repeat-0.4.1, forked-0.2, 
> faulthandler-1.3.1
> collected 23 items
>   
>
> pyarrow/tests/test_plasma.py *** Error in 
> `/home/antoine/miniconda3/envs/pyarrow/bin/python': double free or corruption 
> (!prev): 0x01699520 ***
> [...]
> Current thread 0x7fe7e8570700 (most recent call first):
>   File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 211 in 
> test_connection_failure_raises_exception
> [...]
> {code}
> Here is the C backtrace under gdb:
> {code}
> #0  0x769d0428 in __GI_raise (sig=sig@entry=6) at 
> ../sysdeps/unix/sysv/linux/raise.c:54
> #1  0x769d202a in __GI_abort () at abort.c:89
> #2  0x76a127ea in __libc_message (do_abort=do_abort@entry=2, 
> fmt=fmt@entry=0x76b2bed8 "*** Error in `%s': %s: 0x%s ***\n")
> at ../sysdeps/posix/libc_fatal.c:175
> #3  0x76a1b37a in malloc_printerr (ar_ptr=, 
> ptr=, str=0x76b2c008 "double free or corruption (!prev)", 
> action=3)
> at malloc.c:5006
> #4  _int_free (av=, p=, have_lock=0) at 
> malloc.c:3867
> #5  0x76a1f53c in __GI___libc_free (mem=) at 
> malloc.c:2968
> #6  0x7fffbdfcc504 in std::_Sp_counted_ptr (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x9defb0)
> at /usr/include/c++/4.9/bits/shared_ptr_base.h:373
> #7  0x7fffbdfc903c in 
> std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x9defb0) 
> at /usr/include/c++/4.9/bits/shared_ptr_base.h:149
> #8  0x7fffbdfc82b9 in 
> std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count 
> (this=0x7fffc1214510, __in_chrg=)
> at /usr/include/c++/4.9/bits/shared_ptr_base.h:666
> #9  0x7fffbdfc8276 in std::__shared_ptr (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7fffc1214508, 
> __in_chrg=)
> at /usr/include/c++/4.9/bits/shared_ptr_base.h:914
> #10 0x7fffbdfc8fc4 in std::shared_ptr::~shared_ptr 
> (this=0x7fffc1214508, __in_chrg=)
> at /usr/include/c++/4.9/bits/shared_ptr.h:93
> #11 0x7fffbdfc8fde in 
> __Pyx_call_destructor (x=...)
> at /home/antoine/arrow/python/build/temp.linux-x86_64-3.6/plasma.cxx:281
> #12 0x7fffbdfbc317 in __pyx_tp_dealloc_7pyarrow_6plasma_PlasmaClient 
> (o=0x7fffc12144f0)
> at /home/antoine/arrow/python/build/temp.linux-x86_64-3.6/plasma.cxx:10383
> #13 0x7fffbdfb8986 in __pyx_pf_7pyarrow_6plasma_2connect (__pyx_self=0x0, 
> __pyx_v_store_socket_name=0x7fffbc922c48, 
> __pyx_v_manager_socket_name=0x77fa0ab0, __pyx_v_release_delay=0, 
> __pyx_v_num_retries=1)
> at /home/antoine/arrow/python/build/temp.linux-x86_64-3.6/plasma.cxx:9147
> #14 0x7fffbdfb7dec in __pyx_pw_7pyarrow_6plasma_3connect (__pyx_self=0x0, 
> __pyx_args=0x7fffbc4d9688, __pyx_kwds=0x0)
> at /home/antoine/arrow/python/build/temp.linux-x86_64-3.6/plasma.cxx:8978
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2494) Return status codes from PlasmaClient::Seal

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2494:
--
Labels: pull-request-available  (was: )

> Return status codes from PlasmaClient::Seal
> ---
>
> Key: ARROW-2494
> URL: https://issues.apache.org/jira/browse/ARROW-2494
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2494) Return status codes from PlasmaClient::Seal

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447815#comment-16447815
 ] 

ASF GitHub Bot commented on ARROW-2494:
---

kszucs opened a new pull request #1932: ARROW-2494: [C++] Return status codes 
from PlasmaClient::Seal instead of crashing
URL: https://github.com/apache/arrow/pull/1932
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Return status codes from PlasmaClient::Seal
> ---
>
> Key: ARROW-2494
> URL: https://issues.apache.org/jira/browse/ARROW-2494
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2494) Return status codes from PlasmaClient::Seal

2018-04-23 Thread Krisztian Szucs (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reassigned ARROW-2494:
--

Assignee: Krisztian Szucs

> Return status codes from PlasmaClient::Seal
> ---
>
> Key: ARROW-2494
> URL: https://issues.apache.org/jira/browse/ARROW-2494
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2494) Return status codes from PlasmaClient::Seal

2018-04-23 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-2494:
--

 Summary: Return status codes from PlasmaClient::Seal
 Key: ARROW-2494
 URL: https://issues.apache.org/jira/browse/ARROW-2494
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Krisztian Szucs






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-564) [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array if there are nulls)

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447725#comment-16447725
 ] 

ASF GitHub Bot commented on ARROW-564:
--

pitrou commented on a change in pull request #1931: ARROW-564 [Python] Add 
support for return zero copy NumPy arrays
URL: https://github.com/apache/arrow/pull/1931#discussion_r183307671
 
 

 ##
 File path: python/pyarrow/tests/test_array.py
 ##
 @@ -83,6 +83,33 @@ def test_long_array_format():
 assert result == expected
 
 
+def test_to_numpy_zero_copy():
+import gc
+
+arr = pa.array(range(10))
+
+for i in range(10):
 
 Review comment:
   Why do you loop 10 times?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array 
> if there are nulls)
> 
>
> Key: ARROW-564
> URL: https://issues.apache.org/jira/browse/ARROW-564
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: beginner, pull-request-available
> Fix For: 1.0.0
>
>
> At the moment, for {{pyarrow.Array}} instances, we have a method called 
> {{to_pandas}}. While this method returns NumPy Arrays, it returns them in the 
> form that Pandas would use them in its {{Series}}. The difference here is 
> visible for example in the case of integers with null values. For Pandas, we 
> convert it into a float array and set all entries to NaN where we have null 
> entries in the Arrow array. For vanilla NumPy arrays, we would return a tuple 
> of a valid bytemap (not bitmap!) and a values array. The values array in this 
> case should simply be a view on the underlying Arrow buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-564) [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array if there are nulls)

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447729#comment-16447729
 ] 

ASF GitHub Bot commented on ARROW-564:
--

pitrou commented on a change in pull request #1931: ARROW-564 [Python] Add 
support for return zero copy NumPy arrays
URL: https://github.com/apache/arrow/pull/1931#discussion_r183307323
 
 

 ##
 File path: python/pyarrow/tests/test_array.py
 ##
 @@ -83,6 +83,33 @@ def test_long_array_format():
 assert result == expected
 
 
+def test_to_numpy_zero_copy():
+import gc
+
+arr = pa.array(range(10))
+
+for i in range(10):
+np_arr = arr.to_numpy()
+assert sys.getrefcount(np_arr) == 2
 
 Review comment:
   I'm not sure what this line is meant to test?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array 
> if there are nulls)
> 
>
> Key: ARROW-564
> URL: https://issues.apache.org/jira/browse/ARROW-564
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: beginner, pull-request-available
> Fix For: 1.0.0
>
>
> At the moment, for {{pyarrow.Array}} instances, we have a method called 
> {{to_pandas}}. While this method returns NumPy Arrays, it returns them in the 
> form that Pandas would use them in its {{Series}}. The difference here is 
> visible for example in the case of integers with null values. For Pandas, we 
> convert it into a float array and set all entries to NaN where we have null 
> entries in the Arrow array. For vanilla NumPy arrays, we would return a tuple 
> of a valid bytemap (not bitmap!) and a values array. The values array in this 
> case should simply be a view on the underlying Arrow buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-564) [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array if there are nulls)

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447728#comment-16447728
 ] 

ASF GitHub Bot commented on ARROW-564:
--

pitrou commented on a change in pull request #1931: ARROW-564 [Python] Add 
support for return zero copy NumPy arrays
URL: https://github.com/apache/arrow/pull/1931#discussion_r183308083
 
 

 ##
 File path: python/pyarrow/tests/test_array.py
 ##
 @@ -83,6 +83,33 @@ def test_long_array_format():
 assert result == expected
 
 
+def test_to_numpy_zero_copy():
+import gc
+
+arr = pa.array(range(10))
+
+for i in range(10):
+np_arr = arr.to_numpy()
+assert sys.getrefcount(np_arr) == 2
+np_arr = None  # noqa
+
+assert sys.getrefcount(arr) == 2
+
+for i in range(10):
+arr = pa.array(range(10))
+np_arr = arr.to_numpy()
+arr = None
+gc.collect()
+
+# Ensure base is still valid
 
 Review comment:
   I'm not sure that's the right way of looking at it. Just check that 
`np_arr.base` is not None...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array 
> if there are nulls)
> 
>
> Key: ARROW-564
> URL: https://issues.apache.org/jira/browse/ARROW-564
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: beginner, pull-request-available
> Fix For: 1.0.0
>
>
> At the moment, for {{pyarrow.Array}} instances, we have a method called 
> {{to_pandas}}. While this method returns NumPy Arrays, it returns them in the 
> form that Pandas would use them in its {{Series}}. The difference here is 
> visible for example in the case of integers with null values. For Pandas, we 
> convert it into a float array and set all entries to NaN where we have null 
> entries in the Arrow array. For vanilla NumPy arrays, we would return a tuple 
> of a valid bytemap (not bitmap!) and a values array. The values array in this 
> case should simply be a view on the underlying Arrow buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-564) [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array if there are nulls)

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447731#comment-16447731
 ] 

ASF GitHub Bot commented on ARROW-564:
--

pitrou commented on a change in pull request #1931: ARROW-564 [Python] Add 
support for return zero copy NumPy arrays
URL: https://github.com/apache/arrow/pull/1931#discussion_r183309840
 
 

 ##
 File path: python/pyarrow/tests/test_array.py
 ##
 @@ -83,6 +83,33 @@ def test_long_array_format():
 assert result == expected
 
 
+def test_to_numpy_zero_copy():
 
 Review comment:
   This function isn't actually testing the zero-copy part. You should mutate 
the result Numpy array and check the original Arrow array is mutated (of 
course, the fact we're able to get a mutable Numpy array from an Arrow array 
could be seen as a bug).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array 
> if there are nulls)
> 
>
> Key: ARROW-564
> URL: https://issues.apache.org/jira/browse/ARROW-564
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: beginner, pull-request-available
> Fix For: 1.0.0
>
>
> At the moment, for {{pyarrow.Array}} instances, we have a method called 
> {{to_pandas}}. While this method returns NumPy Arrays, it returns them in the 
> form that Pandas would use them in its {{Series}}. The difference here is 
> visible for example in the case of integers with null values. For Pandas, we 
> convert it into a float array and set all entries to NaN where we have null 
> entries in the Arrow array. For vanilla NumPy arrays, we would return a tuple 
> of a valid bytemap (not bitmap!) and a values array. The values array in this 
> case should simply be a view on the underlying Arrow buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-564) [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array if there are nulls)

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447727#comment-16447727
 ] 

ASF GitHub Bot commented on ARROW-564:
--

pitrou commented on a change in pull request #1931: ARROW-564 [Python] Add 
support for return zero copy NumPy arrays
URL: https://github.com/apache/arrow/pull/1931#discussion_r183308880
 
 

 ##
 File path: python/pyarrow/tests/test_array.py
 ##
 @@ -577,6 +604,31 @@ def test_simple_type_construction():
 str(result)
 
 
+@pytest.mark.parametrize(
+'narr',
+[
+np.arange(10, dtype=np.int64),
+np.arange(10, dtype=np.int32),
+np.arange(10, dtype=np.int16),
+np.arange(10, dtype=np.int8),
+np.arange(10, dtype=np.uint64),
+np.arange(10, dtype=np.uint32),
+np.arange(10, dtype=np.uint16),
+np.arange(10, dtype=np.uint8),
+np.arange(10, dtype=np.float64),
+np.arange(10, dtype=np.float32),
+np.arange(10, dtype=np.float16),
+]
+)
+def test_to_numpy_roundtrip(narr):
+arr = pa.array(narr)
+assert narr.dtype == arr.to_numpy().dtype
+assert np.array_equal(narr, arr.to_numpy())
 
 Review comment:
   Use `np.testing.assert_array_equal`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array 
> if there are nulls)
> 
>
> Key: ARROW-564
> URL: https://issues.apache.org/jira/browse/ARROW-564
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: beginner, pull-request-available
> Fix For: 1.0.0
>
>
> At the moment, for {{pyarrow.Array}} instances, we have a method called 
> {{to_pandas}}. While this method returns NumPy Arrays, it returns them in the 
> form that Pandas would use them in its {{Series}}. The difference here is 
> visible for example in the case of integers with null values. For Pandas, we 
> convert it into a float array and set all entries to NaN where we have null 
> entries in the Arrow array. For vanilla NumPy arrays, we would return a tuple 
> of a valid bytemap (not bitmap!) and a values array. The values array in this 
> case should simply be a view on the underlying Arrow buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-564) [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array if there are nulls)

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447730#comment-16447730
 ] 

ASF GitHub Bot commented on ARROW-564:
--

pitrou commented on a change in pull request #1931: ARROW-564 [Python] Add 
support for return zero copy NumPy arrays
URL: https://github.com/apache/arrow/pull/1931#discussion_r183308153
 
 

 ##
 File path: python/pyarrow/tests/test_array.py
 ##
 @@ -83,6 +83,33 @@ def test_long_array_format():
 assert result == expected
 
 
+def test_to_numpy_zero_copy():
+import gc
+
+arr = pa.array(range(10))
+
+for i in range(10):
+np_arr = arr.to_numpy()
+assert sys.getrefcount(np_arr) == 2
+np_arr = None  # noqa
+
+assert sys.getrefcount(arr) == 2
+
+for i in range(10):
+arr = pa.array(range(10))
+np_arr = arr.to_numpy()
+arr = None
+gc.collect()
+
+# Ensure base is still valid
+
+# Because of py.test's assert inspection magic, if you put getrefcount
+# on the line being examined, it will be 1 higher than you expect
+base_refcount = sys.getrefcount(np_arr.base)
+assert base_refcount == 2
+np_arr.sum()
 
 Review comment:
   You should check the result value.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array 
> if there are nulls)
> 
>
> Key: ARROW-564
> URL: https://issues.apache.org/jira/browse/ARROW-564
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: beginner, pull-request-available
> Fix For: 1.0.0
>
>
> At the moment, for {{pyarrow.Array}} instances, we have a method called 
> {{to_pandas}}. While this method returns NumPy Arrays, it returns them in the 
> form that Pandas would use them in its {{Series}}. The difference here is 
> visible for example in the case of integers with null values. For Pandas, we 
> convert it into a float array and set all entries to NaN where we have null 
> entries in the Arrow array. For vanilla NumPy arrays, we would return a tuple 
> of a valid bytemap (not bitmap!) and a values array. The values array in this 
> case should simply be a view on the underlying Arrow buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-564) [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array if there are nulls)

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447726#comment-16447726
 ] 

ASF GitHub Bot commented on ARROW-564:
--

pitrou commented on a change in pull request #1931: ARROW-564 [Python] Add 
support for return zero copy NumPy arrays
URL: https://github.com/apache/arrow/pull/1931#discussion_r183307516
 
 

 ##
 File path: python/pyarrow/tests/test_array.py
 ##
 @@ -83,6 +83,33 @@ def test_long_array_format():
 assert result == expected
 
 
+def test_to_numpy_zero_copy():
+import gc
+
+arr = pa.array(range(10))
+
+for i in range(10):
+np_arr = arr.to_numpy()
+assert sys.getrefcount(np_arr) == 2
+np_arr = None  # noqa
+
+assert sys.getrefcount(arr) == 2
 
 Review comment:
   Instead of harcoding this, you should check the original value hasn't 
changed:
   ```
   old_refcount = sys.getrefcount(arr)
   # ... do something
   assert sys.getrefcount(arr) == old_refcount
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Add methods to return vanilla NumPy arrays (plus boolean mask array 
> if there are nulls)
> 
>
> Key: ARROW-564
> URL: https://issues.apache.org/jira/browse/ARROW-564
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: beginner, pull-request-available
> Fix For: 1.0.0
>
>
> At the moment, for {{pyarrow.Array}} instances, we have a method called 
> {{to_pandas}}. While this method returns NumPy Arrays, it returns them in the 
> form that Pandas would use them in its {{Series}}. The difference here is 
> visible for example in the case of integers with null values. For Pandas, we 
> convert it into a float array and set all entries to NaN where we have null 
> entries in the Arrow array. For vanilla NumPy arrays, we would return a tuple 
> of a valid bytemap (not bitmap!) and a values array. The values array in this 
> case should simply be a view on the underlying Arrow buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2491) [Python] Array.from_buffers does not work for ListArray

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447708#comment-16447708
 ] 

ASF GitHub Bot commented on ARROW-2491:
---

pitrou commented on a change in pull request #1927: ARROW-2491: [Python] raise 
NotImplementedError on from_buffers with nested types
URL: https://github.com/apache/arrow/pull/1927#discussion_r183306910
 
 

 ##
 File path: python/pyarrow/types.pxi
 ##
 @@ -75,6 +75,11 @@ cdef bytes _datatype_to_pep3118(CDataType* type):
 return char
 
 
+def _is_primitive(Type type):
 
 Review comment:
   Can't this be a DataType method or property instead?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Array.from_buffers does not work for ListArray
> ---
>
> Key: ARROW-2491
> URL: https://issues.apache.org/jira/browse/ARROW-2491
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.9.0
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> When you get the buffers from a ListArray and feed it back into 
> {{pyarrow.Array.from_buffers}} the code fails with a DCHECK: 
> {code}
> ./src/arrow/array.cc:247 Check failed: (data->buffers.size()) == (2)))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2427) [C++] ReadAt implementations suboptimal

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447698#comment-16447698
 ] 

ASF GitHub Bot commented on ARROW-2427:
---

pitrou commented on issue #1867: ARROW-2427: [C++] Implement ReadAt properly
URL: https://github.com/apache/arrow/pull/1867#issuecomment-383488993
 
 
   Ok, rebased.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] ReadAt implementations suboptimal
> ---
>
> Key: ARROW-2427
> URL: https://issues.apache.org/jira/browse/ARROW-2427
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> The {{ReadAt}} implementations for at least {{OSFile}} and 
> {{MemoryMappedFile}} take the file lock and seek. They could instead read 
> directly from the given offset, allowing concurrent I/O from multiple threads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2451) Handle more dtypes efficiently in custom numpy array serializer.

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447675#comment-16447675
 ] 

ASF GitHub Bot commented on ARROW-2451:
---

mitar commented on a change in pull request #1887: ARROW-2451: [Python] Handle 
non-object arrays more efficiently in custom serializer.
URL: https://github.com/apache/arrow/pull/1887#discussion_r183298796
 
 

 ##
 File path: python/pyarrow/serialization.py
 ##
 @@ -37,11 +37,22 @@
 # python_to_arrow.cc)
 
 def _serialize_numpy_array_list(obj):
-return obj.tolist(), obj.dtype.str
+if obj.dtype.str != '|O':
 
 Review comment:
   I think you could check here for `obj.dtype.hasobject`: 
https://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.hasobject.html


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Handle more dtypes efficiently in custom numpy array serializer.
> 
>
> Key: ARROW-2451
> URL: https://issues.apache.org/jira/browse/ARROW-2451
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Robert Nishihara
>Assignee: Robert Nishihara
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.0
>
>
> Right now certain dtypes like bool or fixed length strings are serialized as 
> lists, which is inefficient. We can handle these more efficiently by casting 
> them to uint8 and saving the original dtype as additional data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >