Re: [DISCUSS][Java] How to solve the problem of OutOfMemoryException when there is sufficient memory?

2019-05-29 Thread Micah Kornfield
(Adding Java to thread title) For more context, I pushed back on the changes in https://github.com/apache/arrow/pull/4358 because they don't seem typical in memory management systems (i.e. they expose internal implementation details of the allocator). I think https://github.com/apache/arrow/pull/

Re: [DISCUSS] PR Backlog reduction

2019-05-29 Thread Micah Kornfield
That sounds great Wes, than you and your team for taking it on. Can you clarify, if you would prefer this approach to the one I proposed above (i.e. should I delete the spreadsheet) or are they complementary? Thanks, Micah On Wed, May 29, 2019 at 12:07 PM Wes McKinney wrote: > On the call toda

[jira] [Created] (ARROW-5445) [Website] Remove language that encourages pinning a version

2019-05-29 Thread Neal Richardson (JIRA)
Neal Richardson created ARROW-5445: -- Summary: [Website] Remove language that encourages pinning a version Key: ARROW-5445 URL: https://issues.apache.org/jira/browse/ARROW-5445 Project: Apache Arrow

[jira] [Created] (ARROW-5444) [Release][Website] After 0.14 release, update what is an "official" release

2019-05-29 Thread Neal Richardson (JIRA)
Neal Richardson created ARROW-5444: -- Summary: [Release][Website] After 0.14 release, update what is an "official" release Key: ARROW-5444 URL: https://issues.apache.org/jira/browse/ARROW-5444 Project

Re: ARROW-4714: Providing JNI interface to Read ORC file via Arrow C++

2019-05-29 Thread Yurui Zhou
Hey guys: Currently all the comments has been resolved and all the builds and tests are passed. Is there any other general comments regarding this changes? Yurui On 21 May 2019, 10:36 AM +0800, Yurui Zhou , wrote: > Hi Micah: > > Thanks for the response. According to our benchmark, the cpp-orc

[jira] [Created] (ARROW-5443) [Gandiva][Crossbow] Turn parquet encryption off

2019-05-29 Thread Praveen Kumar Desabandu (JIRA)
Praveen Kumar Desabandu created ARROW-5443: -- Summary: [Gandiva][Crossbow] Turn parquet encryption off Key: ARROW-5443 URL: https://issues.apache.org/jira/browse/ARROW-5443 Project: Apache Arro

Re: [DISCUSS] PR Backlog reduction

2019-05-29 Thread Wes McKinney
On the call today we discussed possibly repurposing the Spark PR dashboard application for our use * https://github.com/databricks/spark-pr-dashboard * https://spark-prs.appspot.com/ This is a project that my team could take on this year sometime On Wed, May 29, 2019 at 4:12 AM Fan Liya wrote:

Re: Arrow sync call tomorrow (May 29) at 12:00 US/Eastern, 16:00 UTC

2019-05-29 Thread Neal Richardson
Attendees: * Bryan Cutler * François Saint-Jacques * John Muehlhausen * Neal Richardson * Praveen Kumar * Wes McKinney John: Raised question of custom metadata in file footer (see https://lists.apache.org/thread.html/c3b3d1456b7062a435f6795c0308ccb7c8fe55c818cfed2cf55f76c5@%3Cdev.arrow.apache.org%

Re: Column/Partition Pruning

2019-05-29 Thread Russell Jurney
I've got things working like this: # Test ticker ticker = 'AAPL' stocks_close_ds = ParquetDataset( 'data/v4.parquet', filters=[('Ticker','=',ticker)] ) table = stocks_close_ds.read() stocks_close_df = table.to_pandas() stocks_close_df.head() # prints the filtered pandas.DataFrame I'll

[jira] [Created] (ARROW-5442) [Website] Clarify what makes a release artifact "official"

2019-05-29 Thread Neal Richardson (JIRA)
Neal Richardson created ARROW-5442: -- Summary: [Website] Clarify what makes a release artifact "official" Key: ARROW-5442 URL: https://issues.apache.org/jira/browse/ARROW-5442 Project: Apache Arrow

Re: [DISCUSS] Parquet C++/Rust: Rename Parquet::LogicalType to Parquet::ConvertedType

2019-05-29 Thread Francois Saint-Jacques
+1 on renaming for to avoid confusion at the cost of breaking some API. On Wed, May 29, 2019 at 10:09 AM Wes McKinney wrote: > > I'm in favor of making the change -- it's slightly disruptive for > library-users, but the fix is no more complicated than a > search-and-replace. When the C++ project

Re: [DISCUSS] Parquet C++/Rust: Rename Parquet::LogicalType to Parquet::ConvertedType

2019-05-29 Thread Chao Sun
I'm +1 on the change for the Rust side as well. It probably won't be as disruptive as the C++ side. On Wed, May 29, 2019 at 7:09 AM Wes McKinney wrote: > I'm in favor of making the change -- it's slightly disruptive for > library-users, but the fix is no more complicated than a > search-and-repl

[jira] [Created] (ARROW-5441) [C++] Implement FindArrowFlight.cmake

2019-05-29 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-5441: - Summary: [C++] Implement FindArrowFlight.cmake Key: ARROW-5441 URL: https://issues.apache.org/jira/browse/ARROW-5441 Project: Apache Arrow Issue Type: Impr

[jira] [Created] (ARROW-5440) Rust Parquet requiring libstd-xxx.so dependency on centos

2019-05-29 Thread Tenzin Rigden (JIRA)
Tenzin Rigden created ARROW-5440: Summary: Rust Parquet requiring libstd-xxx.so dependency on centos Key: ARROW-5440 URL: https://issues.apache.org/jira/browse/ARROW-5440 Project: Apache Arrow

Re: [DISCUSS] Parquet C++/Rust: Rename Parquet::LogicalType to Parquet::ConvertedType

2019-05-29 Thread Wes McKinney
I'm in favor of making the change -- it's slightly disruptive for library-users, but the fix is no more complicated than a search-and-replace. When the C++ project was started, the LogicalType union didn't exist and "LogicalType" seemed like a more appropriate name for ConvertedType. On Wed, May 2

Propose custom_metadata for Footer

2019-05-29 Thread John Muehlhausen
Original write of File: Schema: custom_metadata: {"value":1} Message Message Footer Schema: custom_metadata: {"value":1} Process appends messages (new data in bold): Schema: custom_metadata: {"value":1} Message Message *Message* *Footer* * Schema: custom_metadata: {"value":2}* Re-writing t

[jira] [Created] (ARROW-5439) [Java] Utilize stream EOS in File format

2019-05-29 Thread John Muehlhausen (JIRA)
John Muehlhausen created ARROW-5439: --- Summary: [Java] Utilize stream EOS in File format Key: ARROW-5439 URL: https://issues.apache.org/jira/browse/ARROW-5439 Project: Apache Arrow Issue Typ

[jira] [Created] (ARROW-5438) [JS] Utilize stream EOS in File format

2019-05-29 Thread John Muehlhausen (JIRA)
John Muehlhausen created ARROW-5438: --- Summary: [JS] Utilize stream EOS in File format Key: ARROW-5438 URL: https://issues.apache.org/jira/browse/ARROW-5438 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-5437) [Python] Misssing pandas pytest marker from parquet tests

2019-05-29 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-5437: -- Summary: [Python] Misssing pandas pytest marker from parquet tests Key: ARROW-5437 URL: https://issues.apache.org/jira/browse/ARROW-5437 Project: Apache Arrow

Re: [DISCUSS] Parquet C++/Rust: Rename Parquet::LogicalType to Parquet::ConvertedType

2019-05-29 Thread Wes McKinney
You all probably want to join d...@parquet.apache.org and have the discussion there. From a governance perspective that's where we need to talk about making breaking changes to the Parquet C++ library LogicalType was introduced into the Parquet format in October 2017 to be a more flexible and futu

[jira] [Created] (ARROW-5436) [Python] expose filters argument in parquet.read_table

2019-05-29 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5436: Summary: [Python] expose filters argument in parquet.read_table Key: ARROW-5436 URL: https://issues.apache.org/jira/browse/ARROW-5436 Project: Apache A

Re: [DISCUSS] Parquet C++/Rust: Rename Parquet::LogicalType to Parquet::ConvertedType

2019-05-29 Thread Joris Van den Bossche
Yes, the LogicalType is newer than ConvertedType in the parquet format, and was until recently not implemented in parquet-cpp. The problem is that originally, the parquet thrift::ConvertedType was implemented in parquet-cpp as LogicalType. Now, support is added in parquet-cpp for this newer parquet

[jira] [Created] (ARROW-5435) IntervalYearVector#getObject should return Period with both year and month

2019-05-29 Thread Ji Liu (JIRA)
Ji Liu created ARROW-5435: - Summary: IntervalYearVector#getObject should return Period with both year and month Key: ARROW-5435 URL: https://issues.apache.org/jira/browse/ARROW-5435 Project: Apache Arrow

Re: [DISCUSS] PR Backlog reduction

2019-05-29 Thread Fan Liya
Sounds like a great idea. I am interested in Java PRs. Best, Liya Fan On Wed, May 29, 2019 at 1:28 PM Micah Kornfield wrote: > Sorry for the delay. I created > > https://docs.google.com/spreadsheets/d/146lDg11c5ohgVkrOglrb42a1JB0Gm1qBRbnoDlvB8QY/edit#gid=0 > as > simple way to distribute old P

Re: [DISCUSS] Parquet C++/Rust: Rename Parquet::LogicalType to Parquet::ConvertedType

2019-05-29 Thread Antoine Pitrou
Le 29/05/2019 à 10:47, Deepak Majeti a écrit : > "ConvertedType" term is used by the parquet specification below. This type > is used to map client data types to Parquet types. > https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L48 But apparently there's also "

Re: [DISCUSS] Parquet C++/Rust: Rename Parquet::LogicalType to Parquet::ConvertedType

2019-05-29 Thread Joris Van den Bossche
Op wo 29 mei 2019 om 10:00 schreef Antoine Pitrou : > > Why "converted"? Is there a conversion? > > "Converted" is the terminology used in the parquet format: https://github.com/apache/parquet-format/blob/b5d34faf47b59b1220a1bbe0fc438be71fed6d90/src/main/thrift/parquet.thrift#L43-L48 > > Le 29/

Re: [DISCUSS] Parquet C++/Rust: Rename Parquet::LogicalType to Parquet::ConvertedType

2019-05-29 Thread Deepak Majeti
"ConvertedType" term is used by the parquet specification below. This type is used to map client data types to Parquet types. https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L48 On Wed, May 29, 2019 at 1:30 PM Antoine Pitrou wrote: > > Why "converted"? Is the

Re: [DISCUSS] Parquet C++/Rust: Rename Parquet::LogicalType to Parquet::ConvertedType

2019-05-29 Thread Antoine Pitrou
Why "converted"? Is there a conversion? Le 29/05/2019 à 08:46, Deepak Majeti a écrit : > Hi Everyone, > > In the early days of parquet-cpp development, the developers mapped the > thrift::ConvertedType to parquet::LogicalType. > This now leads to confusion with the recent introduction of > th