[jira] [Commented] (PARQUET-1482) [C++] Unable to read data from parquet file generated with parquetjs
[ https://issues.apache.org/jira/browse/PARQUET-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814870#comment-16814870 ] Rylan Dmello commented on PARQUET-1482: --- Hi [~terag], sorry, I did take a look at this, but didn't really have the time to resolve this over the last few weeks. I just opened a new Jira issue to add basic DataPageV2 support to the low-level API: https://issues.apache.org/jira/browse/PARQUET-1560 . I can add updates to that issue instead of this one, since this is already resolved. I couldn't easily reproduce the issue when using the low-level API to read the 'feeds1kMicros.parquet' file generated by parquetjs. Either this has already been fixed in arrow/master, or I might need to dig deeper to understand the problem. Do you possibly have an example parquet file which isn't readable with the low-level API? If so, feel free to attach it to the new Jira issue I linked. > [C++] Unable to read data from parquet file generated with parquetjs > > > Key: PARQUET-1482 > URL: https://issues.apache.org/jira/browse/PARQUET-1482 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp >Reporter: Hatem Helal >Assignee: Rylan Dmello >Priority: Major > Labels: pull-request-available > Fix For: cpp-1.6.0 > > Attachments: feeds1kMicros.parquet > > Time Spent: 3.5h > Remaining Estimate: 0h > > See attached file, when I debug: > {{% ./parquet-reader feed1kMicros.parquet}} > I see that the {{scanner->HasNext()}} always returns false. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PARQUET-1560) [C++] Unable to read parquetjs-created file using low-level parquet-cpp API
Rylan Dmello created PARQUET-1560: - Summary: [C++] Unable to read parquetjs-created file using low-level parquet-cpp API Key: PARQUET-1560 URL: https://issues.apache.org/jira/browse/PARQUET-1560 Project: Parquet Issue Type: Bug Reporter: Rylan Dmello Assignee: Rylan Dmello Attachments: feeds1kMicros.parquet Follow-up to PARQUET-1482: basic support for reading Parquet files with Data page V2 pages was added as a part of PARQUET-1482. However, this was only added to the higher-level Arrow API, and not to the lower-level Parquet API. We could port this fix to the lower-level API so that more users can read Parquet files with Data page V2 pages. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Current Parquet Version
Fokko, Thank you! I'm not very experienced with GitHub yet and had looked in the wrong place. Best, Brian On 4/9/19, 10:38 PM, "Driesprong, Fokko" wrote: EXTERNAL Hi Brian, You could take a look at the Github of the Apache Parquet Format itself: https://github.com/apache/parquet-format Cheers, Fokko Op ma 8 apr. 2019 om 20:19 schreef Brian Bowman : > What is most current Apache Parquet file format version? Where is this > designated on the official Apache (or GitHub) site? > > Thanks, > > > Brian >
[jira] [Created] (PARQUET-1559) Add way to manually commit already written data to disk
Victor created PARQUET-1559: --- Summary: Add way to manually commit already written data to disk Key: PARQUET-1559 URL: https://issues.apache.org/jira/browse/PARQUET-1559 Project: Parquet Issue Type: Improvement Components: parquet-mr Affects Versions: 1.10.1 Reporter: Victor I'm not exactly sure this is compliant with the way parquet works, but I have the following need: * I'm using parquet-avro to write to a parquet file during a long running process * I would like to be able from time to time to access the already written data So I was expecting to be able to flush manually the file to ensure the data is on disk and then copy the file for preliminary analysis. If it's contradictory to the way parquet works (for example there is something about metadata being at the footer of the file), what would then be the alternative? Closing the file and opening a new one to continue writing? Could this be supported directly by parquet-mr maybe? It would then write multiple files in that case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1558) Use try-with-resource in Apache Avro tests
[ https://issues.apache.org/jira/browse/PARQUET-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated PARQUET-1558: Labels: pull-request-available (was: ) > Use try-with-resource in Apache Avro tests > -- > > Key: PARQUET-1558 > URL: https://issues.apache.org/jira/browse/PARQUET-1558 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1558) Use try-with-resource in Apache Avro tests
[ https://issues.apache.org/jira/browse/PARQUET-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814429#comment-16814429 ] ASF GitHub Bot commented on PARQUET-1558: - Fokko commented on pull request #634: PARQUET-1558: Use try-with-resource in Apache Avro tests URL: https://github.com/apache/parquet-mr/pull/634 We can use the try-with-resource pattern the implicitly close the resources such as readers and writers, provided by Avro and Parquet. This makes the code more readable since we don't have the `out.close()` everywhere. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Use try-with-resource in Apache Avro tests > -- > > Key: PARQUET-1558 > URL: https://issues.apache.org/jira/browse/PARQUET-1558 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PARQUET-1558) Use try-with-resource in Apache Avro tests
Fokko Driesprong created PARQUET-1558: - Summary: Use try-with-resource in Apache Avro tests Key: PARQUET-1558 URL: https://issues.apache.org/jira/browse/PARQUET-1558 Project: Parquet Issue Type: Improvement Components: parquet-mr Reporter: Fokko Driesprong Assignee: Fokko Driesprong -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PARQUET-1557) Replace deprecated Apache Avro methods
Fokko Driesprong created PARQUET-1557: - Summary: Replace deprecated Apache Avro methods Key: PARQUET-1557 URL: https://issues.apache.org/jira/browse/PARQUET-1557 Project: Parquet Issue Type: Improvement Components: parquet-mr Reporter: Fokko Driesprong Assignee: Fokko Driesprong -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1143) Update Java for format 2.4.0 changes
[ https://issues.apache.org/jira/browse/PARQUET-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814281#comment-16814281 ] Yuming Wang commented on PARQUET-1143: -- [~rdblue] Should we update the *Fix Version/s*? > Update Java for format 2.4.0 changes > > > Key: PARQUET-1143 > URL: https://issues.apache.org/jira/browse/PARQUET-1143 > Project: Parquet > Issue Type: Task > Components: parquet-mr >Affects Versions: 1.9.0, 1.8.2 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)