[jira] [Commented] (PARQUET-1482) [C++] Unable to read data from parquet file generated with parquetjs

2019-04-10 Thread Rylan Dmello (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814870#comment-16814870
 ] 

Rylan Dmello commented on PARQUET-1482:
---

Hi [~terag], sorry, I did take a look at this, but didn't really have the time 
to resolve this over the last few weeks.

I just opened a new Jira issue to add basic DataPageV2 support to the low-level 
API: https://issues.apache.org/jira/browse/PARQUET-1560 . I can add updates to 
that issue instead of this one, since this is already resolved.

I couldn't easily reproduce the issue when using the low-level API to read the 
'feeds1kMicros.parquet' file generated by parquetjs. Either this has already 
been fixed in arrow/master, or I might need to dig deeper to understand the 
problem. Do you possibly have an example parquet file which isn't readable with 
the low-level API? If so, feel free to attach it to the new Jira issue I linked.

> [C++] Unable to read data from parquet file generated with parquetjs
> 
>
> Key: PARQUET-1482
> URL: https://issues.apache.org/jira/browse/PARQUET-1482
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
>Reporter: Hatem Helal
>Assignee: Rylan Dmello
>Priority: Major
>  Labels: pull-request-available
> Fix For: cpp-1.6.0
>
> Attachments: feeds1kMicros.parquet
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> See attached file, when I debug:
> {{% ./parquet-reader feed1kMicros.parquet}}
> I see that the {{scanner->HasNext()}} always returns false.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1560) [C++] Unable to read parquetjs-created file using low-level parquet-cpp API

2019-04-10 Thread Rylan Dmello (JIRA)
Rylan Dmello created PARQUET-1560:
-

 Summary: [C++] Unable to read parquetjs-created file using 
low-level parquet-cpp API
 Key: PARQUET-1560
 URL: https://issues.apache.org/jira/browse/PARQUET-1560
 Project: Parquet
  Issue Type: Bug
Reporter: Rylan Dmello
Assignee: Rylan Dmello
 Attachments: feeds1kMicros.parquet

Follow-up to PARQUET-1482: basic support for reading Parquet files with Data 
page V2 pages was added as a part of PARQUET-1482.

 

However, this was only added to the higher-level Arrow API, and not to the 
lower-level Parquet API. We could port this fix to the lower-level API so that 
more users can read Parquet files with Data page V2 pages.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Current Parquet Version

2019-04-10 Thread Brian Bowman
Fokko,

Thank you!  I'm not very experienced with GitHub yet and had looked in the 
wrong place.

Best,

Brian 

On 4/9/19, 10:38 PM, "Driesprong, Fokko"  wrote:

EXTERNAL

Hi Brian,

You could take a look at the Github of the Apache Parquet Format itself:
https://github.com/apache/parquet-format

Cheers, Fokko

Op ma 8 apr. 2019 om 20:19 schreef Brian Bowman :

> What is most current Apache Parquet file format version?  Where is this
> designated on the official Apache (or GitHub) site?
>
> Thanks,
>
>
> Brian
>




[jira] [Created] (PARQUET-1559) Add way to manually commit already written data to disk

2019-04-10 Thread Victor (JIRA)
Victor created PARQUET-1559:
---

 Summary: Add way to manually commit already written data to disk
 Key: PARQUET-1559
 URL: https://issues.apache.org/jira/browse/PARQUET-1559
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-mr
Affects Versions: 1.10.1
Reporter: Victor


I'm not exactly sure this is compliant with the way parquet works, but I have 
the following need:
 * I'm using parquet-avro to write to a parquet file during a long running 
process
 * I would like to be able from time to time to access the already written data

So I was expecting to be able to flush manually the file to ensure the data is 
on disk and then copy the file for preliminary analysis.

If it's contradictory to the way parquet works (for example there is something 
about metadata being at the footer of the file), what would then be the 
alternative?

Closing the file and opening a new one to continue writing?

Could this be supported directly by parquet-mr maybe? It would then write 
multiple files in that case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1558) Use try-with-resource in Apache Avro tests

2019-04-10 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated PARQUET-1558:

Labels: pull-request-available  (was: )

> Use try-with-resource in Apache Avro tests
> --
>
> Key: PARQUET-1558
> URL: https://issues.apache.org/jira/browse/PARQUET-1558
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1558) Use try-with-resource in Apache Avro tests

2019-04-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814429#comment-16814429
 ] 

ASF GitHub Bot commented on PARQUET-1558:
-

Fokko commented on pull request #634: PARQUET-1558: Use try-with-resource in 
Apache Avro tests
URL: https://github.com/apache/parquet-mr/pull/634
 
 
   We can use the try-with-resource pattern the implicitly close the resources 
such as readers and writers, provided by Avro and Parquet. This makes the code 
more readable since we don't have the `out.close()` everywhere.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use try-with-resource in Apache Avro tests
> --
>
> Key: PARQUET-1558
> URL: https://issues.apache.org/jira/browse/PARQUET-1558
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1558) Use try-with-resource in Apache Avro tests

2019-04-10 Thread Fokko Driesprong (JIRA)
Fokko Driesprong created PARQUET-1558:
-

 Summary: Use try-with-resource in Apache Avro tests
 Key: PARQUET-1558
 URL: https://issues.apache.org/jira/browse/PARQUET-1558
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-mr
Reporter: Fokko Driesprong
Assignee: Fokko Driesprong






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1557) Replace deprecated Apache Avro methods

2019-04-10 Thread Fokko Driesprong (JIRA)
Fokko Driesprong created PARQUET-1557:
-

 Summary: Replace deprecated Apache Avro methods
 Key: PARQUET-1557
 URL: https://issues.apache.org/jira/browse/PARQUET-1557
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-mr
Reporter: Fokko Driesprong
Assignee: Fokko Driesprong






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1143) Update Java for format 2.4.0 changes

2019-04-10 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814281#comment-16814281
 ] 

Yuming Wang commented on PARQUET-1143:
--

[~rdblue] Should we update the *Fix Version/s*?

> Update Java for format 2.4.0 changes
> 
>
> Key: PARQUET-1143
> URL: https://issues.apache.org/jira/browse/PARQUET-1143
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-mr
>Affects Versions: 1.9.0, 1.8.2
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)