[jira] [Commented] (BEAM-9288) Conscrypt shaded dependency

2020-02-10 Thread Igor Dvorzhak (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17034066#comment-17034066
 ] 

Igor Dvorzhak commented on BEAM-9288:
-

Will it be better to exclude Concrypt from shaded GCS IO jar and rely on 
Conscrypt installed system-wide, if any?

> Conscrypt shaded dependency
> ---
>
> Key: BEAM-9288
> URL: https://issues.apache.org/jira/browse/BEAM-9288
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Esun Kim
>Assignee: sunjincheng
>Priority: Major
>
> Conscrypt is not designed to be shaded properly mainly because of so files. I 
> happened to see BEAM-9030 (*1) creating a new vendored gRPC shading Conscrypt 
> (*2) in it. I think this could make a problem when new Conscrypt is brought 
> by new gcsio depending on gRPC-alts (*4) in a dependency chain. (*5) In this 
> case, it may have a conflict when finding proper so files for Conscrypt. 
> *1: https://issues.apache.org/jira/browse/BEAM-9030
> *2:  
> [https://github.com/apache/beam/blob/e24d1e51cbabe27cb3cc381fd95b334db639c45d/buildSrc/src/main/groovy/org/apache/beam/gradle/GrpcVendoring_1_26_0.groovy#L78]
> *3: https://issues.apache.org/jira/browse/BEAM-6136
> *4: [https://mvnrepository.com/artifact/io.grpc/grpc-alts/1.27.0]
> *5: https://issues.apache.org/jira/browse/BEAM-8889
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-6736) Upgrade gcsio dependency to 1.9.15

2019-02-25 Thread Igor Dvorzhak (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Dvorzhak closed BEAM-6736.
---
Resolution: Fixed

> Upgrade gcsio dependency to 1.9.15
> --
>
> Key: BEAM-6736
> URL: https://issues.apache.org/jira/browse/BEAM-6736
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, sdk-java-core
>Affects Versions: 2.10.0, 2.11.0
>Reporter: Igor Dvorzhak
>Priority: Major
> Fix For: 2.11.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> GCS IO 1.9.12-1.9.14 could send large number of GCS list requests (if there 
> are a 1000+ of files in the folder) in 
> GoogleCloudStorageFileSystem#getFileInfo method.
> This issue is mitigated in GCS IO 1.9.15:
>  [https://github.com/GoogleCloudPlatform/bigdata-interop/releases/tag/v1.9.15]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-6697) ParquetIO Performance test is failing on (GCS filesystem)

2019-02-25 Thread Igor Dvorzhak (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16777392#comment-16777392
 ] 

Igor Dvorzhak commented on BEAM-6697:
-

GCS IO 1.9.16 with the fix was just released:
https://github.com/GoogleCloudPlatform/bigdata-interop/releases/tag/v1.9.16

> ParquetIO Performance test is failing on (GCS filesystem)
> -
>
> Key: BEAM-6697
> URL: https://issues.apache.org/jira/browse/BEAM-6697
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-parquet, test-failures
>Reporter: Lukasz Gajowy
>Priority: Blocker
> Fix For: 2.11.0
>
>
> Relevant failure logs: 
> {code:java}
> Caused by: java.lang.RuntimeException: 
> org.apache.beam.sdk.io.parquet.ParquetIO$ReadFiles$BeamParquetInputFile@2de8303e
>  is not a Parquet file (too small length: -1)
>   at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:514)
>   at 
> org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:689)
>   at 
> org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:595)
>   at 
> org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:152)
>   at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
>   at 
> org.apache.beam.sdk.io.parquet.ParquetIO$ReadFiles$ReadFn.processElement(ParquetIO.java:221){code}
>  
> Full logs can be found here: 
> [https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_ParquetIOIT/|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_ParquetIOIT/1096/console]
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-6697) ParquetIO Performance test is failing on (GCS filesystem)

2019-02-25 Thread Igor Dvorzhak (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16777332#comment-16777332
 ] 

Igor Dvorzhak edited comment on BEAM-6697 at 2/25/19 10:15 PM:
---

This happens because `GoogleCloudStorageReadChannel` constructor doesn't 
initialize GCS metadata (includes object size) - it's initialized lazily during 
first read.
 Metadata initialized eagerly only if `GoogleCloudStorageReadChannel` created 
via `GoogleCloudStorage.open()` method.

It's fixed 
[here|https://github.com/GoogleCloudPlatform/bigdata-interop/commit/8f6443bfd6ee821c5667dd2811cf3fe03167b755]
 and will be release in GCS connector 1.9.16 in couple hours.


was (Author: medb):
This happens is because `GoogleCloudStorageReadChannel` constructor doesn't 
initialize GCS metadata (includes object size) - it's initialized lazily during 
first read.
Metadata initialized eagerly only if `GoogleCloudStorageReadChannel` created 
via `GoogleCloudStorage.open()` method.

It's fixed 
[here|https://github.com/GoogleCloudPlatform/bigdata-interop/commit/8f6443bfd6ee821c5667dd2811cf3fe03167b755]
 and will be release in GCS connector 1.9.16 in couple hours.

> ParquetIO Performance test is failing on (GCS filesystem)
> -
>
> Key: BEAM-6697
> URL: https://issues.apache.org/jira/browse/BEAM-6697
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-parquet, test-failures
>Reporter: Lukasz Gajowy
>Priority: Blocker
> Fix For: 2.11.0
>
>
> Relevant failure logs: 
> {code:java}
> Caused by: java.lang.RuntimeException: 
> org.apache.beam.sdk.io.parquet.ParquetIO$ReadFiles$BeamParquetInputFile@2de8303e
>  is not a Parquet file (too small length: -1)
>   at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:514)
>   at 
> org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:689)
>   at 
> org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:595)
>   at 
> org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:152)
>   at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
>   at 
> org.apache.beam.sdk.io.parquet.ParquetIO$ReadFiles$ReadFn.processElement(ParquetIO.java:221){code}
>  
> Full logs can be found here: 
> [https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_ParquetIOIT/|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_ParquetIOIT/1096/console]
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-6736) Upgrade gcsio dependency to 1.9.15

2019-02-22 Thread Igor Dvorzhak (JIRA)
Igor Dvorzhak created BEAM-6736:
---

 Summary: Upgrade gcsio dependency to 1.9.15
 Key: BEAM-6736
 URL: https://issues.apache.org/jira/browse/BEAM-6736
 Project: Beam
  Issue Type: Bug
  Components: io-java-gcp, sdk-java-core
Affects Versions: 2.10.0
Reporter: Igor Dvorzhak


GCS IO 1.9.12-1.9.14 could send large number of GCS list requests (if there are 
a 1000+ of files in the folder) in GoogleCloudStorage#getFileInfo method.

This issue is mitigated in GCS IO 1.9.15:
https://github.com/GoogleCloudPlatform/bigdata-interop/releases/tag/v1.9.15



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6736) Upgrade gcsio dependency to 1.9.15

2019-02-22 Thread Igor Dvorzhak (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Dvorzhak updated BEAM-6736:

Description: 
GCS IO 1.9.12-1.9.14 could send large number of GCS list requests (if there are 
a 1000+ of files in the folder) in GoogleCloudStorageFileSystem#getFileInfo 
method.

This issue is mitigated in GCS IO 1.9.15:
 [https://github.com/GoogleCloudPlatform/bigdata-interop/releases/tag/v1.9.15]

  was:
GCS IO 1.9.12-1.9.14 could send large number of GCS list requests (if there are 
a 1000+ of files in the folder) in GoogleCloudStorage#getFileInfo method.

This issue is mitigated in GCS IO 1.9.15:
https://github.com/GoogleCloudPlatform/bigdata-interop/releases/tag/v1.9.15


> Upgrade gcsio dependency to 1.9.15
> --
>
> Key: BEAM-6736
> URL: https://issues.apache.org/jira/browse/BEAM-6736
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, sdk-java-core
>Affects Versions: 2.10.0
>Reporter: Igor Dvorzhak
>Priority: Major
>
> GCS IO 1.9.12-1.9.14 could send large number of GCS list requests (if there 
> are a 1000+ of files in the folder) in 
> GoogleCloudStorageFileSystem#getFileInfo method.
> This issue is mitigated in GCS IO 1.9.15:
>  [https://github.com/GoogleCloudPlatform/bigdata-interop/releases/tag/v1.9.15]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)