[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=407268=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-407268 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 20/Mar/20 21:27 Start Date: 20/Mar/20 21:27 Worklog Time Spent: 10m Work Description: iemejia commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-601918008 We let it like this because we had a regression, if we want to generate and compile the class we need to have thrift installed in all Beam workers (and as a requirement for everyone building Beam) which I think is clearly overklll just for a test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 407268) Time Spent: 17.5h (was: 17h 20m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Fix For: 2.20.0 > > Time Spent: 17.5h > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=406304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-406304 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 19/Mar/20 15:03 Start Date: 19/Mar/20 15:03 Worklog Time Spent: 10m Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-601229916 We ended up including the generated `TestThriftStruct` class instead of trying to generate at compile time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 406304) Time Spent: 17h 20m (was: 17h 10m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Fix For: 2.20.0 > > Time Spent: 17h 20m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=405241=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405241 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 18/Mar/20 06:29 Start Date: 18/Mar/20 06:29 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-600448077 I'm sorry about the confusion. What happened with this after all? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 405241) Time Spent: 17h 10m (was: 17h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Fix For: 2.20.0 > > Time Spent: 17h 10m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=389070=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389070 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 18/Feb/20 20:41 Start Date: 18/Feb/20 20:41 Worklog Time Spent: 10m Work Description: iemejia commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-587811830 Thanks ping me when you open the PR to get it merged eagerly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389070) Time Spent: 17h (was: 16h 50m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Fix For: 2.20.0 > > Time Spent: 17h > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=389069=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389069 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 18/Feb/20 20:40 Start Date: 18/Feb/20 20:40 Worklog Time Spent: 10m Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-587810419 @iemejia yes I'll re-add the generated class. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389069) Time Spent: 16h 50m (was: 16h 40m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Fix For: 2.20.0 > > Time Spent: 16h 50m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=389068=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389068 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 18/Feb/20 20:38 Start Date: 18/Feb/20 20:38 Worklog Time Spent: 10m Work Description: iemejia commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-587807299 Arrghh it seems we broke master because the workers don't have thrift installed, can we just go back to the generated class and comment the compiler step so we can unblock everyone. Can you help me with this @chrlarsen or @pabloem (I am not in my computer). For info https://lists.apache.org/thread.html/r8b2c65bfccc5a68796811804903d4ca08827359de2399a94b6d18197%40%3Cdev.beam.apache.org%3E This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389068) Time Spent: 16h 40m (was: 16.5h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Fix For: 2.20.0 > > Time Spent: 16h 40m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=388713=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-388713 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 18/Feb/20 08:41 Start Date: 18/Feb/20 08:41 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 388713) Time Spent: 16.5h (was: 16h 20m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Fix For: 2.20.0 > > Time Spent: 16.5h > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=388712=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-388712 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 18/Feb/20 08:41 Start Date: 18/Feb/20 08:41 Worklog Time Spent: 10m Work Description: iemejia commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-587343778 Merged manually to squash the commits. Thanks @chrlarsen ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 388712) Time Spent: 16h 20m (was: 16h 10m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Fix For: 2.20.0 > > Time Spent: 16h 20m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=388689=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-388689 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 18/Feb/20 07:05 Start Date: 18/Feb/20 07:05 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-587312699 aw this is great. Thanks! LGTM. I'll let Ismael / Steve add their comments, and merge if nothing else. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 388689) Time Spent: 16h 10m (was: 16h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 16h 10m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=387066=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387066 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 14/Feb/20 02:42 Start Date: 14/Feb/20 02:42 Worklog Time Spent: 10m Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-586070601 @pabloem @steveniemitz @iemejia comments have been addressed and updated. Also I will add `inferBeamSchema` to the future plans :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387066) Time Spent: 16h (was: 15h 50m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 16h > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=386979=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386979 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 14/Feb/20 00:11 Start Date: 14/Feb/20 00:11 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r379189609 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + * PCollection examples = files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto); + * } + * + * Writing Thrift Files + * + * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} to be written to + * Thrift files. It can be used with the general-purpose {@link FileIO} transforms with + * FileIO.write/writeDynamic specifically. + * + * For example: + * + * {@code + * pipeline + * .apply(...) // PCollection + * .apply(FileIO + * .write() + * .via(ThriftIO.sink(thriftProto)) + * .to("destination/path"); + * } + * + * This IO API is considered experimental and may break or receive backwards-incompatible changes + * in future versions of the Apache Beam SDK. + */ +@Experimental(Experimental.Kind.SOURCE_SINK) +public class ThriftIO { + + private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class); + + /** Disable construction of utility class. */ + private ThriftIO() {} + + /** + * Reads each file in a {@link PCollection} of {@link org.apache.beam.sdk.io.FileIO.ReadableFile}, + * which allows more flexible usage. + */ + public static ReadFiles readFiles(Class recordClass) { +return new AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build(); + } + + // + + /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link FileIO#writeDynamic}. */ + public static > Sink
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=386331=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386331 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 13/Feb/20 00:54 Start Date: 13/Feb/20 00:54 Worklog Time Spent: 10m Work Description: steveniemitz commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r378597769 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + * PCollection examples = files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto); + * } + * + * Writing Thrift Files + * + * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} to be written to + * Thrift files. It can be used with the general-purpose {@link FileIO} transforms with + * FileIO.write/writeDynamic specifically. + * + * For example: + * + * {@code + * pipeline + * .apply(...) // PCollection + * .apply(FileIO + * .write() + * .via(ThriftIO.sink(thriftProto)) + * .to("destination/path"); + * } + * + * This IO API is considered experimental and may break or receive backwards-incompatible changes + * in future versions of the Apache Beam SDK. + */ +@Experimental(Experimental.Kind.SOURCE_SINK) +public class ThriftIO { + + private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class); + + /** Disable construction of utility class. */ + private ThriftIO() {} + + /** + * Reads each file in a {@link PCollection} of {@link org.apache.beam.sdk.io.FileIO.ReadableFile}, + * which allows more flexible usage. + */ + public static ReadFiles readFiles(Class recordClass) { +return new AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build(); + } + + // + + /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link FileIO#writeDynamic}. */ + public static > Sink
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=386327=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386327 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 13/Feb/20 00:44 Start Date: 13/Feb/20 00:44 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r378594519 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + * PCollection examples = files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto); + * } + * + * Writing Thrift Files + * + * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} to be written to + * Thrift files. It can be used with the general-purpose {@link FileIO} transforms with + * FileIO.write/writeDynamic specifically. + * + * For example: + * + * {@code + * pipeline + * .apply(...) // PCollection + * .apply(FileIO + * .write() + * .via(ThriftIO.sink(thriftProto)) + * .to("destination/path"); + * } + * + * This IO API is considered experimental and may break or receive backwards-incompatible changes + * in future versions of the Apache Beam SDK. + */ +@Experimental(Experimental.Kind.SOURCE_SINK) +public class ThriftIO { + + private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class); + + /** Disable construction of utility class. */ + private ThriftIO() {} + + /** + * Reads each file in a {@link PCollection} of {@link org.apache.beam.sdk.io.FileIO.ReadableFile}, + * which allows more flexible usage. + */ + public static ReadFiles readFiles(Class recordClass) { +return new AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build(); + } + + // + + /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link FileIO#writeDynamic}. */ + public static > Sink
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=386321=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386321 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 13/Feb/20 00:16 Start Date: 13/Feb/20 00:16 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r378586352 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/package-info.java ## @@ -0,0 +1,20 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** Transforms for reading and writing to Thrift files. */ +package org.apache.beam.sdk.io.thrift; Review comment: Done updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386321) Time Spent: 15h 20m (was: 15h 10m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 15h 20m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=386314=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386314 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 12/Feb/20 23:58 Start Date: 12/Feb/20 23:58 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r378581158 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import java.io.IOException; +import java.io.InputStream; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; +import java.io.OutputStream; +import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.coders.CustomCoder; + +public class ThriftCoder extends CustomCoder { + + public static ThriftCoder of() { +return new ThriftCoder<>(); + } + + /** + * Encodes the given value of type {@code T} onto the given output stream. + * + * @param value {@link org.apache.thrift.TBase} to encode. + * @param outStream stream to output encoded value to. + * @throws IOException if writing to the {@code OutputStream} fails for some reason + * @throws CoderException if the value could not be encoded for some reason + */ + @Override + public void encode(T value, OutputStream outStream) throws CoderException, IOException { +ObjectOutputStream oos = new ObjectOutputStream(outStream); +oos.writeObject(value); +oos.flush(); + } Review comment: Sounds good, the next commit will include Thrift native serialization. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386314) Time Spent: 15h 10m (was: 15h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 15h 10m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=386089=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386089 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 12/Feb/20 17:34 Start Date: 12/Feb/20 17:34 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r378404834 ## File path: sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftStruct.java ## @@ -0,0 +1,1232 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one Review comment: Sounds good, I'll work on this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386089) Time Spent: 15h (was: 14h 50m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 15h > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=386003=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386003 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 12/Feb/20 15:49 Start Date: 12/Feb/20 15:49 Worklog Time Spent: 10m Work Description: iemejia commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-585267600 One extra thing we just merged a plugin to auto label PRs for different components/extensions/ios, can you please also add the label + path for thrift in this file: https://github.com/apache/beam/blob/master/.github/autolabeler.yml This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386003) Time Spent: 14h 50m (was: 14h 40m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 14h 50m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385999=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385999 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 12/Feb/20 15:43 Start Date: 12/Feb/20 15:43 Worklog Time Spent: 10m Work Description: iemejia commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-585267600 One extra thing we just merged plugin to auto label PRs for different components/extensions/ios, can you please add the label + path for thrift in this file: https://github.com/apache/beam/blob/master/.github/autolabeler.yml This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385999) Time Spent: 14.5h (was: 14h 20m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 14.5h > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=386000=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386000 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 12/Feb/20 15:43 Start Date: 12/Feb/20 15:43 Worklog Time Spent: 10m Work Description: iemejia commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-585267600 One extra thing we just merged a plugin to auto label PRs for different components/extensions/ios, can you please add the label + path for thrift in this file: https://github.com/apache/beam/blob/master/.github/autolabeler.yml This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386000) Time Spent: 14h 40m (was: 14.5h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 14h 40m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385991=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385991 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 12/Feb/20 15:39 Start Date: 12/Feb/20 15:39 Worklog Time Spent: 10m Work Description: steveniemitz commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r378331613 ## File path: sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftStruct.java ## @@ -0,0 +1,1232 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one Review comment: I don't think that thrift guarantees that generated code is forwards or backwards compatible, meaning that if we upgrade libthrift here, the code might no longer compile and need to be regenerated. I'm +1 for having this being generated at compile time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385991) Time Spent: 14h 20m (was: 14h 10m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 14h 20m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385490=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385490 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 22:00 Start Date: 11/Feb/20 22:00 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377925605 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + * PCollection examples = files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto); + * } + * + * Writing Thrift Files + * + * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} to be written to + * Thrift files. It can be used with the general-purpose {@link FileIO} transforms with + * FileIO.write/writeDynamic specifically. + * + * For example: + * + * {@code + * pipeline + * .apply(...) // PCollection + * .apply(FileIO + * .write() + * .via(ThriftIO.sink(thriftProto)) + * .to("destination/path"); + * } + * + * This IO API is considered experimental and may break or receive backwards-incompatible changes + * in future versions of the Apache Beam SDK. + */ +@Experimental(Experimental.Kind.SOURCE_SINK) +public class ThriftIO { + + private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class); + + /** Disable construction of utility class. */ + private ThriftIO() {} + + /** + * Reads each file in a {@link PCollection} of {@link org.apache.beam.sdk.io.FileIO.ReadableFile}, + * which allows more flexible usage. + */ + public static ReadFiles readFiles(Class recordClass) { +return new AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build(); + } + + // + + /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link FileIO#writeDynamic}. */ + public static > Sink
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385488=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385488 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:59 Start Date: 11/Feb/20 21:59 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377925182 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + * PCollection examples = files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto); + * } + * + * Writing Thrift Files + * + * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} to be written to + * Thrift files. It can be used with the general-purpose {@link FileIO} transforms with + * FileIO.write/writeDynamic specifically. + * + * For example: + * + * {@code + * pipeline + * .apply(...) // PCollection + * .apply(FileIO + * .write() + * .via(ThriftIO.sink(thriftProto)) + * .to("destination/path"); + * } + * + * This IO API is considered experimental and may break or receive backwards-incompatible changes + * in future versions of the Apache Beam SDK. + */ +@Experimental(Experimental.Kind.SOURCE_SINK) +public class ThriftIO { + + private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class); + + /** Disable construction of utility class. */ + private ThriftIO() {} + + /** + * Reads each file in a {@link PCollection} of {@link org.apache.beam.sdk.io.FileIO.ReadableFile}, + * which allows more flexible usage. + */ + public static ReadFiles readFiles(Class recordClass) { +return new AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build(); + } + + // + + /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link FileIO#writeDynamic}. */ + public static > Sink
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385489=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385489 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:59 Start Date: 11/Feb/20 21:59 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377925387 ## File path: sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/ThriftIOTest.java ## @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import java.io.Serializable; +import java.nio.ByteBuffer; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomStringUtils; +import org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomUtils; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.Resources; +import org.apache.thrift.protocol.TBinaryProtocol; +import org.apache.thrift.protocol.TCompactProtocol; +import org.apache.thrift.protocol.TJSONProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.protocol.TSimpleJSONProtocol; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests for {@link ThriftIO}. */ +@RunWith(JUnit4.class) +public class ThriftIOTest implements Serializable { + + private static final String RESOURCE_DIR = "ThriftIOTest/"; + + private static final String THRIFT_DIR = Resources.getResource(RESOURCE_DIR).getPath(); + private static final String ALL_THRIFT_STRING = + Resources.getResource(RESOURCE_DIR).getPath() + "*"; + private static final TestThriftStruct TEST_THRIFT_STRUCT = new TestThriftStruct(); + private static List testThriftStructs; + private final TProtocolFactory tBinaryProtoFactory = new TBinaryProtocol.Factory(); + private final TProtocolFactory tJsonProtocolFactory = new TJSONProtocol.Factory(); + private final TProtocolFactory tSimpleJsonProtocolFactory = new TSimpleJSONProtocol.Factory(); + private final TProtocolFactory tCompactProtocolFactory = new TCompactProtocol.Factory(); + @Rule public transient TestPipeline mainPipeline = TestPipeline.create(); + @Rule public transient TestPipeline readPipeline = TestPipeline.create(); + @Rule public transient TestPipeline writePipeline = TestPipeline.create(); + @Rule public transient TemporaryFolder temporaryFolder = new TemporaryFolder(); + + @Before + public void setUp() throws Exception { +byte[] bytes = new byte[10]; +ByteBuffer buffer = ByteBuffer.wrap(bytes); + +TEST_THRIFT_STRUCT.testByte = 100; +TEST_THRIFT_STRUCT.testShort = 200; +TEST_THRIFT_STRUCT.testInt = 2500; +TEST_THRIFT_STRUCT.testLong = 79303L; +TEST_THRIFT_STRUCT.testDouble = 25.007; +TEST_THRIFT_STRUCT.testBool = true; +TEST_THRIFT_STRUCT.stringIntMap = new HashMap<>(); +TEST_THRIFT_STRUCT.stringIntMap.put("first", (short) 1); +TEST_THRIFT_STRUCT.stringIntMap.put("second", (short) 2); +TEST_THRIFT_STRUCT.testBinary = buffer; + +testThriftStructs = ImmutableList.copyOf(generateTestObjects(1000L)); + } + + /** Tests {@link ThriftIO#readFiles(Class)} with {@link TBinaryProtocol}. */ + @Test + public void testReadFilesBinaryProtocol() { + +PCollection testThriftDoc = +mainPipeline +.apply(Create.of(THRIFT_DIR + "data").withCoder(StringUtf8Coder.of())) +
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385484=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385484 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:53 Start Date: 11/Feb/20 21:53 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377922006 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import java.io.IOException; +import java.io.InputStream; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; +import java.io.OutputStream; +import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.coders.CustomCoder; + +public class ThriftCoder extends CustomCoder { Review comment: Done updated This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385484) Time Spent: 13h 40m (was: 13.5h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 13h 40m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385482=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385482 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:51 Start Date: 11/Feb/20 21:51 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377921210 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); Review comment: Sounds good, I'll remove references to `read()` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385482) Time Spent: 13.5h (was: 13h 20m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 13.5h > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385478=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385478 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:45 Start Date: 11/Feb/20 21:45 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377918220 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import java.io.IOException; +import java.io.InputStream; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; +import java.io.OutputStream; +import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.coders.CustomCoder; + +public class ThriftCoder extends CustomCoder { + + public static ThriftCoder of() { Review comment: Done, removed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385478) Time Spent: 13h 20m (was: 13h 10m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 13h 20m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385476=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385476 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:41 Start Date: 11/Feb/20 21:41 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377916026 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + * PCollection examples = files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto); + * } + * + * Writing Thrift Files + * + * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} to be written to + * Thrift files. It can be used with the general-purpose {@link FileIO} transforms with + * FileIO.write/writeDynamic specifically. + * + * For example: + * + * {@code + * pipeline + * .apply(...) // PCollection + * .apply(FileIO + * .write() + * .via(ThriftIO.sink(thriftProto)) + * .to("destination/path"); + * } + * + * This IO API is considered experimental and may break or receive backwards-incompatible changes + * in future versions of the Apache Beam SDK. + */ +@Experimental(Experimental.Kind.SOURCE_SINK) Review comment: Done updated This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385476) Time Spent: 13h 10m (was: 13h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385475=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385475 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:38 Start Date: 11/Feb/20 21:38 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377914782 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); Review comment: +1 to not have a `read()` , less 'useless' code to maintain, other File based IOs only have it for historical reasons and we decided to deprecate `readAll` transforms too to make FileIO.match + read composition more explicit since it cover more cases. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385475) Time Spent: 13h (was: 12h 50m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 13h > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385474=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385474 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:33 Start Date: 11/Feb/20 21:33 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377912388 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); Review comment: Correct `read()` is not implemented and I will remove references to it unless we think it should be implemented. I think `readFiles()` will cover everything but the simple use case. What are your thoughts? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385474) Time Spent: 12h 50m (was: 12h 40m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 12h 50m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385473=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385473 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:29 Start Date: 11/Feb/20 21:29 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377910032 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); Review comment: This reference will be removed. `readFiles()` will take in the class as it is needed for the pipeline to deserialize the data into. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385473) Time Spent: 12h 40m (was: 12.5h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 12h 40m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385469=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385469 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:27 Start Date: 11/Feb/20 21:27 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377909310 ## File path: sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/ThriftIOTest.java ## @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import java.io.Serializable; +import java.nio.ByteBuffer; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomStringUtils; +import org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomUtils; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.Resources; +import org.apache.thrift.protocol.TBinaryProtocol; +import org.apache.thrift.protocol.TCompactProtocol; +import org.apache.thrift.protocol.TJSONProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.protocol.TSimpleJSONProtocol; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests for {@link ThriftIO}. */ +@RunWith(JUnit4.class) +public class ThriftIOTest implements Serializable { + + private static final String RESOURCE_DIR = "ThriftIOTest/"; + + private static final String THRIFT_DIR = Resources.getResource(RESOURCE_DIR).getPath(); + private static final String ALL_THRIFT_STRING = + Resources.getResource(RESOURCE_DIR).getPath() + "*"; + private static final TestThriftStruct TEST_THRIFT_STRUCT = new TestThriftStruct(); + private static List testThriftStructs; + private final TProtocolFactory tBinaryProtoFactory = new TBinaryProtocol.Factory(); + private final TProtocolFactory tJsonProtocolFactory = new TJSONProtocol.Factory(); + private final TProtocolFactory tSimpleJsonProtocolFactory = new TSimpleJSONProtocol.Factory(); + private final TProtocolFactory tCompactProtocolFactory = new TCompactProtocol.Factory(); + @Rule public transient TestPipeline mainPipeline = TestPipeline.create(); + @Rule public transient TestPipeline readPipeline = TestPipeline.create(); + @Rule public transient TestPipeline writePipeline = TestPipeline.create(); + @Rule public transient TemporaryFolder temporaryFolder = new TemporaryFolder(); + + @Before + public void setUp() throws Exception { +byte[] bytes = new byte[10]; +ByteBuffer buffer = ByteBuffer.wrap(bytes); + +TEST_THRIFT_STRUCT.testByte = 100; +TEST_THRIFT_STRUCT.testShort = 200; +TEST_THRIFT_STRUCT.testInt = 2500; +TEST_THRIFT_STRUCT.testLong = 79303L; +TEST_THRIFT_STRUCT.testDouble = 25.007; +TEST_THRIFT_STRUCT.testBool = true; +TEST_THRIFT_STRUCT.stringIntMap = new HashMap<>(); +TEST_THRIFT_STRUCT.stringIntMap.put("first", (short) 1); +TEST_THRIFT_STRUCT.stringIntMap.put("second", (short) 2); +TEST_THRIFT_STRUCT.testBinary = buffer; + +testThriftStructs = ImmutableList.copyOf(generateTestObjects(1000L)); + } + + /** Tests {@link ThriftIO#readFiles(Class)} with {@link TBinaryProtocol}. */ + @Test Review comment: `read` was in the old implementation and I will remove the references to it. I think that `readFiles()` will cover most use cases for this IO.
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385462 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:20 Start Date: 11/Feb/20 21:20 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377901030 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + * PCollection examples = files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto); + * } + * + * Writing Thrift Files + * + * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} to be written to + * Thrift files. It can be used with the general-purpose {@link FileIO} transforms with + * FileIO.write/writeDynamic specifically. + * + * For example: + * + * {@code + * pipeline + * .apply(...) // PCollection + * .apply(FileIO + * .write() + * .via(ThriftIO.sink(thriftProto)) + * .to("destination/path"); + * } + * + * This IO API is considered experimental and may break or receive backwards-incompatible changes + * in future versions of the Apache Beam SDK. + */ +@Experimental(Experimental.Kind.SOURCE_SINK) +public class ThriftIO { + + private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class); + + /** Disable construction of utility class. */ + private ThriftIO() {} + + /** + * Reads each file in a {@link PCollection} of {@link org.apache.beam.sdk.io.FileIO.ReadableFile}, + * which allows more flexible usage. + */ + public static ReadFiles readFiles(Class recordClass) { +return new AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build(); + } + + // + + /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link FileIO#writeDynamic}. */ + public static > Sink
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385464=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385464 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:20 Start Date: 11/Feb/20 21:20 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377901859 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + * PCollection examples = files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto); + * } + * + * Writing Thrift Files + * + * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} to be written to + * Thrift files. It can be used with the general-purpose {@link FileIO} transforms with + * FileIO.write/writeDynamic specifically. + * + * For example: + * + * {@code + * pipeline + * .apply(...) // PCollection + * .apply(FileIO + * .write() + * .via(ThriftIO.sink(thriftProto)) + * .to("destination/path"); + * } + * + * This IO API is considered experimental and may break or receive backwards-incompatible changes + * in future versions of the Apache Beam SDK. + */ +@Experimental(Experimental.Kind.SOURCE_SINK) +public class ThriftIO { + + private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class); + + /** Disable construction of utility class. */ + private ThriftIO() {} + + /** + * Reads each file in a {@link PCollection} of {@link org.apache.beam.sdk.io.FileIO.ReadableFile}, + * which allows more flexible usage. + */ + public static ReadFiles readFiles(Class recordClass) { +return new AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build(); + } + + // + + /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link FileIO#writeDynamic}. */ + public static > Sink
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385465=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385465 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:20 Start Date: 11/Feb/20 21:20 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377895151 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + * PCollection examples = files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto); + * } + * + * Writing Thrift Files + * + * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} to be written to + * Thrift files. It can be used with the general-purpose {@link FileIO} transforms with + * FileIO.write/writeDynamic specifically. + * + * For example: + * + * {@code + * pipeline + * .apply(...) // PCollection + * .apply(FileIO + * .write() + * .via(ThriftIO.sink(thriftProto)) + * .to("destination/path"); + * } + * + * This IO API is considered experimental and may break or receive backwards-incompatible changes + * in future versions of the Apache Beam SDK. + */ +@Experimental(Experimental.Kind.SOURCE_SINK) +public class ThriftIO { + + private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class); + + /** Disable construction of utility class. */ + private ThriftIO() {} + + /** + * Reads each file in a {@link PCollection} of {@link org.apache.beam.sdk.io.FileIO.ReadableFile}, + * which allows more flexible usage. + */ + public static ReadFiles readFiles(Class recordClass) { +return new AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build(); + } + + // + + /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link FileIO#writeDynamic}. */ + public static > Sink
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385459=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385459 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:20 Start Date: 11/Feb/20 21:20 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377886065 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/package-info.java ## @@ -0,0 +1,20 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** Transforms for reading and writing to Thrift files. */ +package org.apache.beam.sdk.io.thrift; Review comment: Add `@Experimental(Kind.SOURCE_SINK)` at the package level too This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385459) Time Spent: 11h 40m (was: 11.5h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 11h 40m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385467=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385467 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:20 Start Date: 11/Feb/20 21:20 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377903969 ## File path: sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/ThriftIOTest.java ## @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import java.io.Serializable; +import java.nio.ByteBuffer; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomStringUtils; +import org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomUtils; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.Resources; +import org.apache.thrift.protocol.TBinaryProtocol; +import org.apache.thrift.protocol.TCompactProtocol; +import org.apache.thrift.protocol.TJSONProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.protocol.TSimpleJSONProtocol; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests for {@link ThriftIO}. */ +@RunWith(JUnit4.class) +public class ThriftIOTest implements Serializable { + + private static final String RESOURCE_DIR = "ThriftIOTest/"; + + private static final String THRIFT_DIR = Resources.getResource(RESOURCE_DIR).getPath(); + private static final String ALL_THRIFT_STRING = + Resources.getResource(RESOURCE_DIR).getPath() + "*"; + private static final TestThriftStruct TEST_THRIFT_STRUCT = new TestThriftStruct(); + private static List testThriftStructs; + private final TProtocolFactory tBinaryProtoFactory = new TBinaryProtocol.Factory(); + private final TProtocolFactory tJsonProtocolFactory = new TJSONProtocol.Factory(); + private final TProtocolFactory tSimpleJsonProtocolFactory = new TSimpleJSONProtocol.Factory(); + private final TProtocolFactory tCompactProtocolFactory = new TCompactProtocol.Factory(); + @Rule public transient TestPipeline mainPipeline = TestPipeline.create(); + @Rule public transient TestPipeline readPipeline = TestPipeline.create(); + @Rule public transient TestPipeline writePipeline = TestPipeline.create(); + @Rule public transient TemporaryFolder temporaryFolder = new TemporaryFolder(); + + @Before + public void setUp() throws Exception { +byte[] bytes = new byte[10]; +ByteBuffer buffer = ByteBuffer.wrap(bytes); + +TEST_THRIFT_STRUCT.testByte = 100; +TEST_THRIFT_STRUCT.testShort = 200; +TEST_THRIFT_STRUCT.testInt = 2500; +TEST_THRIFT_STRUCT.testLong = 79303L; +TEST_THRIFT_STRUCT.testDouble = 25.007; +TEST_THRIFT_STRUCT.testBool = true; +TEST_THRIFT_STRUCT.stringIntMap = new HashMap<>(); +TEST_THRIFT_STRUCT.stringIntMap.put("first", (short) 1); +TEST_THRIFT_STRUCT.stringIntMap.put("second", (short) 2); +TEST_THRIFT_STRUCT.testBinary = buffer; + +testThriftStructs = ImmutableList.copyOf(generateTestObjects(1000L)); + } + + /** Tests {@link ThriftIO#readFiles(Class)} with {@link TBinaryProtocol}. */ + @Test + public void testReadFilesBinaryProtocol() { + +PCollection testThriftDoc = +mainPipeline +.apply(Create.of(THRIFT_DIR + "data").withCoder(StringUtf8Coder.of())) +
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385455=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385455 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:20 Start Date: 11/Feb/20 21:20 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377905535 ## File path: sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftStruct.java ## @@ -0,0 +1,1232 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one Review comment: Hm in that case, I think it's fine to commit the generated file. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385455) Time Spent: 11h 10m (was: 11h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 11h 10m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385466=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385466 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:20 Start Date: 11/Feb/20 21:20 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377900552 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + * PCollection examples = files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto); + * } + * + * Writing Thrift Files + * + * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} to be written to + * Thrift files. It can be used with the general-purpose {@link FileIO} transforms with + * FileIO.write/writeDynamic specifically. + * + * For example: + * + * {@code + * pipeline + * .apply(...) // PCollection + * .apply(FileIO + * .write() + * .via(ThriftIO.sink(thriftProto)) + * .to("destination/path"); + * } + * + * This IO API is considered experimental and may break or receive backwards-incompatible changes + * in future versions of the Apache Beam SDK. + */ +@Experimental(Experimental.Kind.SOURCE_SINK) +public class ThriftIO { + + private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class); + + /** Disable construction of utility class. */ + private ThriftIO() {} + + /** + * Reads each file in a {@link PCollection} of {@link org.apache.beam.sdk.io.FileIO.ReadableFile}, + * which allows more flexible usage. + */ + public static ReadFiles readFiles(Class recordClass) { +return new AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build(); + } + + // + + /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link FileIO#writeDynamic}. */ + public static > Sink
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385460=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385460 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:20 Start Date: 11/Feb/20 21:20 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377894156 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import java.io.IOException; +import java.io.InputStream; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; +import java.io.OutputStream; +import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.coders.CustomCoder; + +public class ThriftCoder extends CustomCoder { + + public static ThriftCoder of() { Review comment: remove public This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385460) Time Spent: 11h 50m (was: 11h 40m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 11h 50m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385457=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385457 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:20 Start Date: 11/Feb/20 21:20 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377905535 ## File path: sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftStruct.java ## @@ -0,0 +1,1232 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one Review comment: Hm in that case, I think it's fine to commit the generated file - unless you feel up to adding the gradle config : ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385457) Time Spent: 11h 20m (was: 11h 10m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 11h 20m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385463=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385463 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:20 Start Date: 11/Feb/20 21:20 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377894855 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); Review comment: Having a read() is not mandatory if the example uses FileIO.match and friends IMO. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385463) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 12h > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385458=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385458 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:20 Start Date: 11/Feb/20 21:20 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377885914 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + * PCollection examples = files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto); + * } + * + * Writing Thrift Files + * + * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} to be written to + * Thrift files. It can be used with the general-purpose {@link FileIO} transforms with + * FileIO.write/writeDynamic specifically. + * + * For example: + * + * {@code + * pipeline + * .apply(...) // PCollection + * .apply(FileIO + * .write() + * .via(ThriftIO.sink(thriftProto)) + * .to("destination/path"); + * } + * + * This IO API is considered experimental and may break or receive backwards-incompatible changes + * in future versions of the Apache Beam SDK. + */ +@Experimental(Experimental.Kind.SOURCE_SINK) Review comment: `@Experimental(Kind.SOURCE_SINK)` to make it consistent with the rest of the code base This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385458) Time Spent: 11.5h (was: 11h 20m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL:
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385461=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385461 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:20 Start Date: 11/Feb/20 21:20 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377885479 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import java.io.IOException; +import java.io.InputStream; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; +import java.io.OutputStream; +import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.coders.CustomCoder; + +public class ThriftCoder extends CustomCoder { Review comment: This is in principle internal, so maybe make it package protected. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385461) Time Spent: 11h 50m (was: 11h 40m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 11h 50m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385453 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 21:14 Start Date: 11/Feb/20 21:14 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377902208 ## File path: sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftStruct.java ## @@ -0,0 +1,1232 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one Review comment: It can be generated from the .thrift file that is included. I think we would need to add a thrift compiler to the build.gradle to compile it for testing, thoughts? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385453) Time Spent: 11h (was: 10h 50m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 11h > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385431 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 20:37 Start Date: 11/Feb/20 20:37 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377883979 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import java.io.IOException; +import java.io.InputStream; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; +import java.io.OutputStream; +import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.coders.CustomCoder; + +public class ThriftCoder extends CustomCoder { + + public static ThriftCoder of() { +return new ThriftCoder<>(); + } + + /** + * Encodes the given value of type {@code T} onto the given output stream. + * + * @param value {@link org.apache.thrift.TBase} to encode. + * @param outStream stream to output encoded value to. + * @throws IOException if writing to the {@code OutputStream} fails for some reason + * @throws CoderException if the value could not be encoded for some reason + */ + @Override + public void encode(T value, OutputStream outStream) throws CoderException, IOException { +ObjectOutputStream oos = new ObjectOutputStream(outStream); +oos.writeObject(value); +oos.flush(); + } Review comment: +1 to use Thrift native serializaton this will enable to share the data with cross-language pipelines in the future This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385431) Time Spent: 10h 50m (was: 10h 40m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 10h 50m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385422=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385422 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 20:03 Start Date: 11/Feb/20 20:03 Worklog Time Spent: 10m Work Description: steveniemitz commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377868235 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import java.io.IOException; +import java.io.InputStream; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; +import java.io.OutputStream; +import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.coders.CustomCoder; + +public class ThriftCoder extends CustomCoder { + + public static ThriftCoder of() { +return new ThriftCoder<>(); + } + + /** + * Encodes the given value of type {@code T} onto the given output stream. + * + * @param value {@link org.apache.thrift.TBase} to encode. + * @param outStream stream to output encoded value to. + * @throws IOException if writing to the {@code OutputStream} fails for some reason + * @throws CoderException if the value could not be encoded for some reason + */ + @Override + public void encode(T value, OutputStream outStream) throws CoderException, IOException { +ObjectOutputStream oos = new ObjectOutputStream(outStream); +oos.writeObject(value); +oos.flush(); + } Review comment: fwiw the java thrift classes will use the TCompactProtocol to serialize themselves when being java serialized. Personally I would rather see a coder here that explicitly uses a TProtocol to serialize the object rather than relying on java serialization to do it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385422) Time Spent: 10h 40m (was: 10.5h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 10h 40m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385407=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385407 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 19:51 Start Date: 11/Feb/20 19:51 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377861986 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import java.io.IOException; +import java.io.InputStream; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; +import java.io.OutputStream; +import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.coders.CustomCoder; + +public class ThriftCoder extends CustomCoder { + + public static ThriftCoder of() { +return new ThriftCoder<>(); + } + + /** + * Encodes the given value of type {@code T} onto the given output stream. + * + * @param value {@link org.apache.thrift.TBase} to encode. + * @param outStream stream to output encoded value to. + * @throws IOException if writing to the {@code OutputStream} fails for some reason + * @throws CoderException if the value could not be encoded for some reason + */ + @Override + public void encode(T value, OutputStream outStream) throws CoderException, IOException { +ObjectOutputStream oos = new ObjectOutputStream(outStream); +oos.writeObject(value); +oos.flush(); + } Review comment: Yes exactly we'll read the records as thrift-serialized then Java serialize within the pipeline. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385407) Time Spent: 10.5h (was: 10h 20m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 10.5h > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=384985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-384985 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 06:08 Start Date: 11/Feb/20 06:08 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377457203 ## File path: sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/ThriftIOTest.java ## @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import java.io.Serializable; +import java.nio.ByteBuffer; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomStringUtils; +import org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomUtils; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.Resources; +import org.apache.thrift.protocol.TBinaryProtocol; +import org.apache.thrift.protocol.TCompactProtocol; +import org.apache.thrift.protocol.TJSONProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.protocol.TSimpleJSONProtocol; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests for {@link ThriftIO}. */ +@RunWith(JUnit4.class) +public class ThriftIOTest implements Serializable { + + private static final String RESOURCE_DIR = "ThriftIOTest/"; + + private static final String THRIFT_DIR = Resources.getResource(RESOURCE_DIR).getPath(); + private static final String ALL_THRIFT_STRING = + Resources.getResource(RESOURCE_DIR).getPath() + "*"; + private static final TestThriftStruct TEST_THRIFT_STRUCT = new TestThriftStruct(); + private static List testThriftStructs; + private final TProtocolFactory tBinaryProtoFactory = new TBinaryProtocol.Factory(); + private final TProtocolFactory tJsonProtocolFactory = new TJSONProtocol.Factory(); + private final TProtocolFactory tSimpleJsonProtocolFactory = new TSimpleJSONProtocol.Factory(); + private final TProtocolFactory tCompactProtocolFactory = new TCompactProtocol.Factory(); + @Rule public transient TestPipeline mainPipeline = TestPipeline.create(); + @Rule public transient TestPipeline readPipeline = TestPipeline.create(); + @Rule public transient TestPipeline writePipeline = TestPipeline.create(); + @Rule public transient TemporaryFolder temporaryFolder = new TemporaryFolder(); + + @Before + public void setUp() throws Exception { +byte[] bytes = new byte[10]; +ByteBuffer buffer = ByteBuffer.wrap(bytes); + +TEST_THRIFT_STRUCT.testByte = 100; +TEST_THRIFT_STRUCT.testShort = 200; +TEST_THRIFT_STRUCT.testInt = 2500; +TEST_THRIFT_STRUCT.testLong = 79303L; +TEST_THRIFT_STRUCT.testDouble = 25.007; +TEST_THRIFT_STRUCT.testBool = true; +TEST_THRIFT_STRUCT.stringIntMap = new HashMap<>(); +TEST_THRIFT_STRUCT.stringIntMap.put("first", (short) 1); +TEST_THRIFT_STRUCT.stringIntMap.put("second", (short) 2); +TEST_THRIFT_STRUCT.testBinary = buffer; + +testThriftStructs = ImmutableList.copyOf(generateTestObjects(1000L)); + } + + /** Tests {@link ThriftIO#readFiles(Class)} with {@link TBinaryProtocol}. */ + @Test Review comment: There doesn't seem to be a test for `read().from(...)` - should there be one? (and it's okay if the answer is 'don't need one') : )
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=384986=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-384986 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 06:08 Start Date: 11/Feb/20 06:08 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377457282 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); Review comment: Do we need to pass the class to this transform? Why/why not? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 384986) Time Spent: 10h 10m (was: 10h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 10h 10m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=384984=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-384984 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 06:08 Start Date: 11/Feb/20 06:08 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377417741 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import java.io.IOException; +import java.io.InputStream; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; +import java.io.OutputStream; +import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.coders.CustomCoder; + +public class ThriftCoder extends CustomCoder { + + public static ThriftCoder of() { +return new ThriftCoder<>(); + } + + /** + * Encodes the given value of type {@code T} onto the given output stream. + * + * @param value {@link org.apache.thrift.TBase} to encode. + * @param outStream stream to output encoded value to. + * @throws IOException if writing to the {@code OutputStream} fails for some reason + * @throws CoderException if the value could not be encoded for some reason + */ + @Override + public void encode(T value, OutputStream outStream) throws CoderException, IOException { +ObjectOutputStream oos = new ObjectOutputStream(outStream); +oos.writeObject(value); +oos.flush(); + } Review comment: It seems that the coder simply does Java serialization. Is that right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 384984) Time Spent: 9h 50m (was: 9h 40m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 9h 50m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=384987=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-384987 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 06:08 Start Date: 11/Feb/20 06:08 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377455798 ## File path: sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftStruct.java ## @@ -0,0 +1,1232 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one Review comment: Is this file meant to be included in the commit? Or can it be generated from the .thrift file - without committing it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 384987) Time Spent: 10h 20m (was: 10h 10m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 10h 20m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=384988=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-384988 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 06:08 Start Date: 11/Feb/20 06:08 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377456909 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import java.io.IOException; +import java.io.InputStream; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; +import java.io.OutputStream; +import org.apache.beam.sdk.coders.CoderException; +import org.apache.beam.sdk.coders.CustomCoder; + +public class ThriftCoder extends CustomCoder { + + public static ThriftCoder of() { +return new ThriftCoder<>(); + } + + /** + * Encodes the given value of type {@code T} onto the given output stream. + * + * @param value {@link org.apache.thrift.TBase} to encode. + * @param outStream stream to output encoded value to. + * @throws IOException if writing to the {@code OutputStream} fails for some reason + * @throws CoderException if the value could not be encoded for some reason + */ + @Override + public void encode(T value, OutputStream outStream) throws CoderException, IOException { +ObjectOutputStream oos = new ObjectOutputStream(outStream); +oos.writeObject(value); +oos.flush(); + } Review comment: Ah so we only consume thrift-serialized records from the storage system, but Java-serialize within the pipeline. That's fine. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 384988) Time Spent: 10h 20m (was: 10h 10m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 10h 20m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=384989=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-384989 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 11/Feb/20 06:08 Start Date: 11/Feb/20 06:08 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r377457540 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); Review comment: In fact, it seems that `read()` is not implemented? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 384989) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 10h 20m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=364278=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-364278 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 29/Dec/19 03:37 Start Date: 29/Dec/19 03:37 Worklog Time Spent: 10m Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-569471049 Run CommunityMetrics PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 364278) Time Spent: 9h 40m (was: 9.5h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 9h 40m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=364273=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-364273 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 29/Dec/19 02:14 Start Date: 29/Dec/19 02:14 Worklog Time Spent: 10m Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-569467327 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 364273) Time Spent: 9.5h (was: 9h 20m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 9.5h > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=364271=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-364271 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 29/Dec/19 00:19 Start Date: 29/Dec/19 00:19 Worklog Time Spent: 10m Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-569462219 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 364271) Time Spent: 9h 20m (was: 9h 10m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 9h 20m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=364269=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-364269 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 29/Dec/19 00:18 Start Date: 29/Dec/19 00:18 Worklog Time Spent: 10m Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-569462167 Run CommunityMetrics PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 364269) Time Spent: 9h (was: 8h 50m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 9h > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=364270=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-364270 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 29/Dec/19 00:18 Start Date: 29/Dec/19 00:18 Worklog Time Spent: 10m Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-569462167 Run CommunityMetrics PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 364270) Time Spent: 9h 10m (was: 9h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 9h 10m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=364178=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-364178 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 28/Dec/19 09:21 Start Date: 28/Dec/19 09:21 Worklog Time Spent: 10m Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-569401090 @chamikaramj @steveniemitz @gsteelman the PR has been refactored to read/write Thrift encoded data. It would be great to get some more feedback, thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 364178) Time Spent: 8h 50m (was: 8h 40m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 8h 50m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361851=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361851 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 20/Dec/19 20:46 Start Date: 20/Dec/19 20:46 Worklog Time Spent: 10m Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-568081482 Great thanks. Yes the TProtocol implementation will be passed in for decoding/encoding. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361851) Time Spent: 8h 40m (was: 8.5h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 8h 40m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361833=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361833 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 20/Dec/19 19:53 Start Date: 20/Dec/19 19:53 Worklog Time Spent: 10m Work Description: steveniemitz commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-568066985 I'd expect #2, where `UserType <: TBase` (ie, it is a thrift struct). You'd also probably want to pass in the TProtocol implementation used to use to decode the file I'd assume (binary, compact, etc). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361833) Time Spent: 8.5h (was: 8h 20m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 8.5h > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361829=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361829 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 20/Dec/19 19:44 Start Date: 20/Dec/19 19:44 Worklog Time Spent: 10m Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-568062975 @steveniemitz thanks for the input. So this PR will change so that the connector will read/write thrift encoded data. As someone who would utilize this connector what are your thoughts on using either of the following to hold the decoded data: 1. `PCollection` in which case the user would pass in an Avro schema that would describe each record in the file. 2. `PCollection<[UserType]>` in which case the user would pass in some class that the record can be decoded to. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361829) Time Spent: 8h 20m (was: 8h 10m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 8h 20m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361235=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361235 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 18/Dec/19 00:50 Start Date: 18/Dec/19 00:50 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r359103959 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/ThriftIdlParser.java ## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift.parser; + +import java.io.IOException; +import java.io.Reader; +import org.antlr.runtime.ANTLRReaderStream; +import org.antlr.runtime.CommonTokenStream; +import org.antlr.runtime.RecognitionException; +import org.antlr.runtime.tree.BufferedTreeNodeStream; +import org.antlr.runtime.tree.Tree; +import org.antlr.runtime.tree.TreeNodeStream; +import org.apache.beam.sdk.io.thrift.parser.antlr.DocumentGenerator; +import org.apache.beam.sdk.io.thrift.parser.antlr.ThriftLexer; +import org.apache.beam.sdk.io.thrift.parser.antlr.ThriftParser; +import org.apache.beam.sdk.io.thrift.parser.model.Document; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.CharSource; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class ThriftIdlParser { + + private static final Logger LOG = LoggerFactory.getLogger(ThriftIdlParser.class); + + /** Generates {@link Document} from {@link org.antlr.runtime.tree.Tree}. */ + public static Document parseThriftIdl(CharSource input) throws IOException { +Tree tree = parseTree(input); +TreeNodeStream stream = new BufferedTreeNodeStream(tree); +DocumentGenerator generator = new DocumentGenerator(stream); +try { + return generator.document().value; +} catch (RecognitionException e) { + LOG.error("Failed to generate document: " + e.getMessage()); + throw new RuntimeException(e); +} + } + + /** Generates {@link org.antlr.runtime.tree.Tree} from input. */ + static Tree parseTree(CharSource input) throws IOException { +try (Reader reader = input.openStream()) { + ThriftLexer lexer = new ThriftLexer(new ANTLRReaderStream(reader)); + ThriftParser parser = new ThriftParser(new CommonTokenStream(lexer)); + try { +Tree tree = (Tree) parser.document().getTree(); +if (parser.getNumberOfSyntaxErrors() > 0) { + LOG.error("Parsing generated " + parser.getNumberOfSyntaxErrors() + "errors."); + throw new RuntimeException("syntax error"); Review comment: That is covered in the `catch` below. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361235) Time Spent: 8h 10m (was: 8h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 8h 10m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361233=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361233 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 18/Dec/19 00:42 Start Date: 18/Dec/19 00:42 Worklog Time Spent: 10m Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-566811364 @gsteelman @chamikaramj [PR](https://github.com/apache/beam/pull/10395) for parser has been opened. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361233) Time Spent: 8h (was: 7h 50m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 8h > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361232=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361232 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 18/Dec/19 00:40 Start Date: 18/Dec/19 00:40 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r359101649 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,708 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static java.lang.String.format; +import static java.util.stream.Collectors.joining; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.Closeable; +import java.io.IOException; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import javax.annotation.Nullable; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.Compression; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser; +import org.apache.beam.sdk.io.thrift.parser.model.BaseType; +import org.apache.beam.sdk.io.thrift.parser.model.Const; +import org.apache.beam.sdk.io.thrift.parser.model.Definition; +import org.apache.beam.sdk.io.thrift.parser.model.Document; +import org.apache.beam.sdk.io.thrift.parser.model.Header; +import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField; +import org.apache.beam.sdk.io.thrift.parser.model.ListType; +import org.apache.beam.sdk.io.thrift.parser.model.MapType; +import org.apache.beam.sdk.io.thrift.parser.model.Service; +import org.apache.beam.sdk.io.thrift.parser.model.StringEnum; +import org.apache.beam.sdk.io.thrift.parser.model.Struct; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftException; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftField; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftType; +import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation; +import org.apache.beam.sdk.io.thrift.parser.model.Typedef; +import org.apache.beam.sdk.io.thrift.parser.model.VoidType; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing Thrift files. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#read} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection documents = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + *
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361230=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361230 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 18/Dec/19 00:29 Start Date: 18/Dec/19 00:29 Worklog Time Spent: 10m Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-566811364 @gsteelman @chamikaramj [PR](https://github.com/apache/beam/pull/10395) for parser has been opened This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361230) Time Spent: 7h 40m (was: 7.5h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 7h 40m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361210=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361210 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 17/Dec/19 23:34 Start Date: 17/Dec/19 23:34 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r359085087 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,708 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static java.lang.String.format; +import static java.util.stream.Collectors.joining; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.Closeable; +import java.io.IOException; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import javax.annotation.Nullable; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.Compression; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser; +import org.apache.beam.sdk.io.thrift.parser.model.BaseType; +import org.apache.beam.sdk.io.thrift.parser.model.Const; +import org.apache.beam.sdk.io.thrift.parser.model.Definition; +import org.apache.beam.sdk.io.thrift.parser.model.Document; +import org.apache.beam.sdk.io.thrift.parser.model.Header; +import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField; +import org.apache.beam.sdk.io.thrift.parser.model.ListType; +import org.apache.beam.sdk.io.thrift.parser.model.MapType; +import org.apache.beam.sdk.io.thrift.parser.model.Service; +import org.apache.beam.sdk.io.thrift.parser.model.StringEnum; +import org.apache.beam.sdk.io.thrift.parser.model.Struct; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftException; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftField; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftType; +import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation; +import org.apache.beam.sdk.io.thrift.parser.model.Typedef; +import org.apache.beam.sdk.io.thrift.parser.model.VoidType; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing Thrift files. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#read} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection documents = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + *
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361203=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361203 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 17/Dec/19 23:26 Start Date: 17/Dec/19 23:26 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r359082635 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/Document.java ## @@ -0,0 +1,424 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift.parser.model; + +import static java.util.Collections.emptyList; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import java.io.IOException; +import java.io.Serializable; +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Objects; +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.generic.GenericRecordBuilder; +import org.apache.avro.reflect.ReflectData; +import org.apache.beam.sdk.io.thrift.parser.visitor.DocumentVisitor; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; + +/** + * The {@link Document} class holds the elements of a Thrift file. + * + * A {@link Document} is made up of: + * + * + * {@link Header} - Contains: includes, cppIncludes, namespaces, and defaultNamespace. + * {@link Document#definitions} - Contains list of Thrift {@link Definition}. + * + */ +public class Document implements Serializable { + private Header header; + private List definitions; + + public Document(Header header, List definitions) { +this.header = checkNotNull(header, "header"); +this.definitions = ImmutableList.copyOf(checkNotNull(definitions, "definitions")); + } + + /** Returns an empty {@link Document}. */ + public static Document emptyDocument() { +List includes = emptyList(); +List cppIncludes = emptyList(); +String defaultNamespace = null; +Map namespaces = Collections.emptyMap(); +Header header = new Header(includes, cppIncludes, defaultNamespace, namespaces); +List definitions = emptyList(); +return new Document(header, definitions); + } + + public Document getDocument() { +return this; + } + + public Header getHeader() { +return this.header; + } + + public void setHeader(Header header) { +this.header = header; + } + + public List getDefinitions() { +return definitions; + } + + public void setDefinitions(List definitions) { +this.definitions = definitions; + } + + public void visit(final DocumentVisitor visitor) throws IOException { +Preconditions.checkNotNull(visitor, "the visitor must not be null!"); + +for (Definition definition : definitions) { + if (visitor.accept(definition)) { +definition.visit(visitor); + } +} + } + + /** Gets Avro {@link Schema} for the object. */ + public Schema getSchema() { +return ReflectData.get().getSchema(Document.class); + } + + /** Gets {@link Document} as a {@link GenericRecord}. */ + public GenericRecord getAsGenericRecord() { +GenericRecordBuilder genericRecordBuilder = new GenericRecordBuilder(this.getSchema()); +genericRecordBuilder.set("header", this.getHeader()).set("definitions", this.getDefinitions()); + +return genericRecordBuilder.build(); + } + + /** Adds list of includes to {@link Document#header}. */ + public void addIncludes(List includes) { +checkNotNull(includes, "includes"); +List currentIncludes = new ArrayList<>(this.getHeader().getIncludes()); +currentIncludes.addAll(includes); +
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361137=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361137 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 17/Dec/19 21:45 Start Date: 17/Dec/19 21:45 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r359045633 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/IntegerEnum.java ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift.parser.model; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import java.util.List; +import java.util.Objects; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; Review comment: For the Beam modules the dependencies are defined in the build.gradle for the respective module. We import the vendored Guava version there and then import that into the code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361137) Time Spent: 7h 10m (was: 7h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 7h 10m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361132=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361132 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 17/Dec/19 21:40 Start Date: 17/Dec/19 21:40 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r359043453 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/Document.java ## @@ -0,0 +1,424 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift.parser.model; + +import static java.util.Collections.emptyList; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import java.io.IOException; +import java.io.Serializable; +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Objects; +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.generic.GenericRecordBuilder; +import org.apache.avro.reflect.ReflectData; +import org.apache.beam.sdk.io.thrift.parser.visitor.DocumentVisitor; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; + +/** + * The {@link Document} class holds the elements of a Thrift file. + * + * A {@link Document} is made up of: + * + * + * {@link Header} - Contains: includes, cppIncludes, namespaces, and defaultNamespace. + * {@link Document#definitions} - Contains list of Thrift {@link Definition}. + * + */ +public class Document implements Serializable { + private Header header; + private List definitions; + + public Document(Header header, List definitions) { +this.header = checkNotNull(header, "header"); +this.definitions = ImmutableList.copyOf(checkNotNull(definitions, "definitions")); + } + + /** Returns an empty {@link Document}. */ + public static Document emptyDocument() { +List includes = emptyList(); +List cppIncludes = emptyList(); +String defaultNamespace = null; +Map namespaces = Collections.emptyMap(); +Header header = new Header(includes, cppIncludes, defaultNamespace, namespaces); +List definitions = emptyList(); +return new Document(header, definitions); + } + + public Document getDocument() { +return this; + } + + public Header getHeader() { +return this.header; + } + + public void setHeader(Header header) { +this.header = header; + } + + public List getDefinitions() { +return definitions; + } + + public void setDefinitions(List definitions) { +this.definitions = definitions; + } + + public void visit(final DocumentVisitor visitor) throws IOException { +Preconditions.checkNotNull(visitor, "the visitor must not be null!"); + +for (Definition definition : definitions) { + if (visitor.accept(definition)) { +definition.visit(visitor); + } +} + } + + /** Gets Avro {@link Schema} for the object. */ + public Schema getSchema() { +return ReflectData.get().getSchema(Document.class); + } + + /** Gets {@link Document} as a {@link GenericRecord}. */ + public GenericRecord getAsGenericRecord() { +GenericRecordBuilder genericRecordBuilder = new GenericRecordBuilder(this.getSchema()); +genericRecordBuilder.set("header", this.getHeader()).set("definitions", this.getDefinitions()); + +return genericRecordBuilder.build(); + } + + /** Adds list of includes to {@link Document#header}. */ + public void addIncludes(List includes) { +checkNotNull(includes, "includes"); +List currentIncludes = new ArrayList<>(this.getHeader().getIncludes()); +currentIncludes.addAll(includes); +
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361081=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361081 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 17/Dec/19 19:39 Start Date: 17/Dec/19 19:39 Worklog Time Spent: 10m Work Description: gsteelman commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-566717207 > Thanks @gsteelman. We would love to have the sample Thrift schema and anything else we can utilize for testing! Who should I have it shared with? Also, https://thrift.apache.org/test/ and this schema https://raw.githubusercontent.com/apache/thrift/master/test/ThriftTest.thrift which might provide additional test cases. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361081) Time Spent: 6h 50m (was: 6h 40m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 6h 50m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360617=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360617 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 17/Dec/19 00:11 Start Date: 17/Dec/19 00:11 Worklog Time Spent: 10m Work Description: gsteelman commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-566309485 > @gsteelman I can split this PR into one for `Document` and one for `ThriftIO`. Since `ThriftIO` has a dependency on `Document` I think it would be best to review that one first, thoughts? > > Thanks! Yes, I think that would make it a little easier for everyone to review. Thank you! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 360617) Time Spent: 6h 40m (was: 6.5h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 6h 40m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360616=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360616 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 17/Dec/19 00:10 Start Date: 17/Dec/19 00:10 Worklog Time Spent: 10m Work Description: gsteelman commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r358531752 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/IntegerEnum.java ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift.parser.model; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import java.util.List; +import java.util.Objects; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; Review comment: I am not sure either. Typically wouldn't the version be specified as a variable in a maven pom file, for example? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 360616) Time Spent: 6.5h (was: 6h 20m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 6.5h > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360606=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360606 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 17/Dec/19 00:09 Start Date: 17/Dec/19 00:09 Worklog Time Spent: 10m Work Description: gsteelman commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r358531622 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/Document.java ## @@ -0,0 +1,424 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift.parser.model; + +import static java.util.Collections.emptyList; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import java.io.IOException; +import java.io.Serializable; +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Objects; +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.generic.GenericRecordBuilder; +import org.apache.avro.reflect.ReflectData; +import org.apache.beam.sdk.io.thrift.parser.visitor.DocumentVisitor; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; + +/** + * The {@link Document} class holds the elements of a Thrift file. + * + * A {@link Document} is made up of: + * + * + * {@link Header} - Contains: includes, cppIncludes, namespaces, and defaultNamespace. + * {@link Document#definitions} - Contains list of Thrift {@link Definition}. + * + */ +public class Document implements Serializable { + private Header header; + private List definitions; + + public Document(Header header, List definitions) { +this.header = checkNotNull(header, "header"); +this.definitions = ImmutableList.copyOf(checkNotNull(definitions, "definitions")); + } + + /** Returns an empty {@link Document}. */ + public static Document emptyDocument() { +List includes = emptyList(); +List cppIncludes = emptyList(); +String defaultNamespace = null; +Map namespaces = Collections.emptyMap(); +Header header = new Header(includes, cppIncludes, defaultNamespace, namespaces); +List definitions = emptyList(); +return new Document(header, definitions); + } + + public Document getDocument() { +return this; + } + + public Header getHeader() { +return this.header; + } + + public void setHeader(Header header) { +this.header = header; + } + + public List getDefinitions() { +return definitions; + } + + public void setDefinitions(List definitions) { +this.definitions = definitions; + } + + public void visit(final DocumentVisitor visitor) throws IOException { +Preconditions.checkNotNull(visitor, "the visitor must not be null!"); + +for (Definition definition : definitions) { + if (visitor.accept(definition)) { +definition.visit(visitor); + } +} + } + + /** Gets Avro {@link Schema} for the object. */ + public Schema getSchema() { +return ReflectData.get().getSchema(Document.class); + } + + /** Gets {@link Document} as a {@link GenericRecord}. */ + public GenericRecord getAsGenericRecord() { +GenericRecordBuilder genericRecordBuilder = new GenericRecordBuilder(this.getSchema()); +genericRecordBuilder.set("header", this.getHeader()).set("definitions", this.getDefinitions()); + +return genericRecordBuilder.build(); + } + + /** Adds list of includes to {@link Document#header}. */ + public void addIncludes(List includes) { +checkNotNull(includes, "includes"); +List currentIncludes = new ArrayList<>(this.getHeader().getIncludes()); +currentIncludes.addAll(includes); +
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360605=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360605 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 17/Dec/19 00:07 Start Date: 17/Dec/19 00:07 Worklog Time Spent: 10m Work Description: gsteelman commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r358531004 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,708 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static java.lang.String.format; +import static java.util.stream.Collectors.joining; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.Closeable; +import java.io.IOException; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import javax.annotation.Nullable; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.Compression; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser; +import org.apache.beam.sdk.io.thrift.parser.model.BaseType; +import org.apache.beam.sdk.io.thrift.parser.model.Const; +import org.apache.beam.sdk.io.thrift.parser.model.Definition; +import org.apache.beam.sdk.io.thrift.parser.model.Document; +import org.apache.beam.sdk.io.thrift.parser.model.Header; +import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField; +import org.apache.beam.sdk.io.thrift.parser.model.ListType; +import org.apache.beam.sdk.io.thrift.parser.model.MapType; +import org.apache.beam.sdk.io.thrift.parser.model.Service; +import org.apache.beam.sdk.io.thrift.parser.model.StringEnum; +import org.apache.beam.sdk.io.thrift.parser.model.Struct; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftException; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftField; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftType; +import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation; +import org.apache.beam.sdk.io.thrift.parser.model.Typedef; +import org.apache.beam.sdk.io.thrift.parser.model.VoidType; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing Thrift files. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#read} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection documents = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + *
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360603=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360603 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 17/Dec/19 00:04 Start Date: 17/Dec/19 00:04 Worklog Time Spent: 10m Work Description: gsteelman commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r358530461 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,708 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static java.lang.String.format; +import static java.util.stream.Collectors.joining; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.Closeable; +import java.io.IOException; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import javax.annotation.Nullable; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.Compression; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser; +import org.apache.beam.sdk.io.thrift.parser.model.BaseType; +import org.apache.beam.sdk.io.thrift.parser.model.Const; +import org.apache.beam.sdk.io.thrift.parser.model.Definition; +import org.apache.beam.sdk.io.thrift.parser.model.Document; +import org.apache.beam.sdk.io.thrift.parser.model.Header; +import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField; +import org.apache.beam.sdk.io.thrift.parser.model.ListType; +import org.apache.beam.sdk.io.thrift.parser.model.MapType; +import org.apache.beam.sdk.io.thrift.parser.model.Service; +import org.apache.beam.sdk.io.thrift.parser.model.StringEnum; +import org.apache.beam.sdk.io.thrift.parser.model.Struct; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftException; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftField; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftType; +import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation; +import org.apache.beam.sdk.io.thrift.parser.model.Typedef; +import org.apache.beam.sdk.io.thrift.parser.model.VoidType; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing Thrift files. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#read} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection documents = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + *
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360561=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360561 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 16/Dec/19 22:56 Start Date: 16/Dec/19 22:56 Worklog Time Spent: 10m Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-566285123 @gsteelman I can split this PR into one for `Document` and one for `ThriftIO`. Since `ThriftIO` has a dependency on `Document` I think it would be best to review that one first, thoughts? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 360561) Time Spent: 5h 50m (was: 5h 40m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360547=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360547 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 16/Dec/19 22:47 Start Date: 16/Dec/19 22:47 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r358507069 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,708 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static java.lang.String.format; +import static java.util.stream.Collectors.joining; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.Closeable; +import java.io.IOException; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import javax.annotation.Nullable; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.Compression; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser; +import org.apache.beam.sdk.io.thrift.parser.model.BaseType; +import org.apache.beam.sdk.io.thrift.parser.model.Const; +import org.apache.beam.sdk.io.thrift.parser.model.Definition; +import org.apache.beam.sdk.io.thrift.parser.model.Document; +import org.apache.beam.sdk.io.thrift.parser.model.Header; +import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField; +import org.apache.beam.sdk.io.thrift.parser.model.ListType; +import org.apache.beam.sdk.io.thrift.parser.model.MapType; +import org.apache.beam.sdk.io.thrift.parser.model.Service; +import org.apache.beam.sdk.io.thrift.parser.model.StringEnum; +import org.apache.beam.sdk.io.thrift.parser.model.Struct; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftException; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftField; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftType; +import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation; +import org.apache.beam.sdk.io.thrift.parser.model.Typedef; +import org.apache.beam.sdk.io.thrift.parser.model.VoidType; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing Thrift files. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#read} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection documents = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + *
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360530=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360530 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 16/Dec/19 22:28 Start Date: 16/Dec/19 22:28 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r358499952 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,708 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static java.lang.String.format; +import static java.util.stream.Collectors.joining; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.Closeable; +import java.io.IOException; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import javax.annotation.Nullable; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.Compression; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser; +import org.apache.beam.sdk.io.thrift.parser.model.BaseType; +import org.apache.beam.sdk.io.thrift.parser.model.Const; +import org.apache.beam.sdk.io.thrift.parser.model.Definition; +import org.apache.beam.sdk.io.thrift.parser.model.Document; +import org.apache.beam.sdk.io.thrift.parser.model.Header; +import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField; +import org.apache.beam.sdk.io.thrift.parser.model.ListType; +import org.apache.beam.sdk.io.thrift.parser.model.MapType; +import org.apache.beam.sdk.io.thrift.parser.model.Service; +import org.apache.beam.sdk.io.thrift.parser.model.StringEnum; +import org.apache.beam.sdk.io.thrift.parser.model.Struct; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftException; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftField; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftType; +import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation; +import org.apache.beam.sdk.io.thrift.parser.model.Typedef; +import org.apache.beam.sdk.io.thrift.parser.model.VoidType; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing Thrift files. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#read} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection documents = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + *
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360519=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360519 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 16/Dec/19 22:17 Start Date: 16/Dec/19 22:17 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r358495630 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,708 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static java.lang.String.format; +import static java.util.stream.Collectors.joining; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.Closeable; +import java.io.IOException; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import javax.annotation.Nullable; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.Compression; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser; +import org.apache.beam.sdk.io.thrift.parser.model.BaseType; +import org.apache.beam.sdk.io.thrift.parser.model.Const; +import org.apache.beam.sdk.io.thrift.parser.model.Definition; +import org.apache.beam.sdk.io.thrift.parser.model.Document; +import org.apache.beam.sdk.io.thrift.parser.model.Header; +import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField; +import org.apache.beam.sdk.io.thrift.parser.model.ListType; +import org.apache.beam.sdk.io.thrift.parser.model.MapType; +import org.apache.beam.sdk.io.thrift.parser.model.Service; +import org.apache.beam.sdk.io.thrift.parser.model.StringEnum; +import org.apache.beam.sdk.io.thrift.parser.model.Struct; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftException; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftField; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftType; +import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation; +import org.apache.beam.sdk.io.thrift.parser.model.Typedef; +import org.apache.beam.sdk.io.thrift.parser.model.VoidType; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing Thrift files. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#read} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection documents = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + *
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360511=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360511 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 16/Dec/19 21:59 Start Date: 16/Dec/19 21:59 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r358488154 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/IntegerEnum.java ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift.parser.model; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import java.util.List; +import java.util.Objects; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; Review comment: I think because Beam only allows for the repackaged version to be used, the version has to be specified (I could be wrong). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 360511) Time Spent: 5h 10m (was: 5h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 5h 10m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360508=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360508 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 16/Dec/19 21:55 Start Date: 16/Dec/19 21:55 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r358486215 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,708 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static java.lang.String.format; +import static java.util.stream.Collectors.joining; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.Closeable; +import java.io.IOException; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import javax.annotation.Nullable; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.Compression; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser; +import org.apache.beam.sdk.io.thrift.parser.model.BaseType; +import org.apache.beam.sdk.io.thrift.parser.model.Const; +import org.apache.beam.sdk.io.thrift.parser.model.Definition; +import org.apache.beam.sdk.io.thrift.parser.model.Document; +import org.apache.beam.sdk.io.thrift.parser.model.Header; +import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField; +import org.apache.beam.sdk.io.thrift.parser.model.ListType; +import org.apache.beam.sdk.io.thrift.parser.model.MapType; +import org.apache.beam.sdk.io.thrift.parser.model.Service; +import org.apache.beam.sdk.io.thrift.parser.model.StringEnum; +import org.apache.beam.sdk.io.thrift.parser.model.Struct; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftException; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftField; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftType; +import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation; +import org.apache.beam.sdk.io.thrift.parser.model.Typedef; +import org.apache.beam.sdk.io.thrift.parser.model.VoidType; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing Thrift files. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#read} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection documents = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + *
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360500=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360500 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 16/Dec/19 21:19 Start Date: 16/Dec/19 21:19 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r358470056 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/ConstList.java ## @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift.parser.model; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import java.util.ArrayList; +import java.util.List; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; + +public class ConstList extends ConstValue { + private final List values; + + public ConstList(List values) { +this.values = ImmutableList.copyOf(checkNotNull(values, "values")); Review comment: This creates an immutable wrapper for the list. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 360500) Time Spent: 4h 50m (was: 4h 40m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360497=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360497 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 16/Dec/19 21:13 Start Date: 16/Dec/19 21:13 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r358467379 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,708 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static java.lang.String.format; +import static java.util.stream.Collectors.joining; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.Closeable; +import java.io.IOException; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import javax.annotation.Nullable; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.Compression; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser; +import org.apache.beam.sdk.io.thrift.parser.model.BaseType; +import org.apache.beam.sdk.io.thrift.parser.model.Const; +import org.apache.beam.sdk.io.thrift.parser.model.Definition; +import org.apache.beam.sdk.io.thrift.parser.model.Document; +import org.apache.beam.sdk.io.thrift.parser.model.Header; +import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField; +import org.apache.beam.sdk.io.thrift.parser.model.ListType; +import org.apache.beam.sdk.io.thrift.parser.model.MapType; +import org.apache.beam.sdk.io.thrift.parser.model.Service; +import org.apache.beam.sdk.io.thrift.parser.model.StringEnum; +import org.apache.beam.sdk.io.thrift.parser.model.Struct; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftException; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftField; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftType; +import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation; +import org.apache.beam.sdk.io.thrift.parser.model.Typedef; +import org.apache.beam.sdk.io.thrift.parser.model.VoidType; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing Thrift files. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#read} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection documents = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + *
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360486=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360486 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 16/Dec/19 20:54 Start Date: 16/Dec/19 20:54 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r358458848 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,708 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static java.lang.String.format; +import static java.util.stream.Collectors.joining; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.Closeable; +import java.io.IOException; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import javax.annotation.Nullable; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.Compression; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser; +import org.apache.beam.sdk.io.thrift.parser.model.BaseType; +import org.apache.beam.sdk.io.thrift.parser.model.Const; +import org.apache.beam.sdk.io.thrift.parser.model.Definition; +import org.apache.beam.sdk.io.thrift.parser.model.Document; +import org.apache.beam.sdk.io.thrift.parser.model.Header; +import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField; +import org.apache.beam.sdk.io.thrift.parser.model.ListType; +import org.apache.beam.sdk.io.thrift.parser.model.MapType; +import org.apache.beam.sdk.io.thrift.parser.model.Service; +import org.apache.beam.sdk.io.thrift.parser.model.StringEnum; +import org.apache.beam.sdk.io.thrift.parser.model.Struct; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftException; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftField; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftType; +import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation; +import org.apache.beam.sdk.io.thrift.parser.model.Typedef; +import org.apache.beam.sdk.io.thrift.parser.model.VoidType; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing Thrift files. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#read} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection documents = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + *
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360417=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360417 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 16/Dec/19 18:31 Start Date: 16/Dec/19 18:31 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r358394393 ## File path: sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/DocumentTest.java ## @@ -0,0 +1,390 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.beam.sdk.io.thrift.parser.model.BaseType; +import org.apache.beam.sdk.io.thrift.parser.model.Const; +import org.apache.beam.sdk.io.thrift.parser.model.ConstInteger; +import org.apache.beam.sdk.io.thrift.parser.model.Document; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField; +import org.apache.beam.sdk.io.thrift.parser.model.Service; +import org.apache.beam.sdk.io.thrift.parser.model.StringEnum; +import org.apache.beam.sdk.io.thrift.parser.model.Struct; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftException; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftField; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod; +import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation; +import org.apache.beam.sdk.io.thrift.parser.model.Typedef; +import org.apache.beam.sdk.io.thrift.parser.model.Union; +import org.apache.beam.sdk.io.thrift.parser.model.VoidType; +import org.junit.Assert; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests for {@link Document} class. */ +@RunWith(JUnit4.class) +public class DocumentTest { + + /** Tests {@link Document#addIncludes(String)}. */ + @Test + public void testAddIncludes() { +Document document = Document.emptyDocument(); +List includesExpected = new ArrayList<>(); +includesExpected.add("simple_test.thrift"); +includesExpected.add("shared.thrift"); +document.addIncludes(includesExpected); + +List includesActual = document.getHeader().getIncludes(); + +Assert.assertEquals(includesExpected, includesActual); + } + + /** Tests {@link Document#addCppIncludes(List)}. */ + @Test + public void testAddCppIncludes() { +Document document = Document.emptyDocument(); +List cppIncludesExpected = new ArrayList<>(); +cppIncludesExpected.add("iostream"); +cppIncludesExpected.add("set"); +document.addCppIncludes(cppIncludesExpected); + +List cppIncludesActual = document.getHeader().getCppIncludes(); + +Assert.assertEquals(cppIncludesExpected, cppIncludesActual); + } + + /** Tests {@link Document#removeDefinition(String)}. */ + @Test + public void testRemoveDefinition() { +Document document = Document.emptyDocument(); +List emptyAnnotations = new ArrayList<>(); +String constName = "STRINGCONSTANT"; +document.addConstString(constName, emptyAnnotations, "test_string"); +Assert.assertEquals(1, document.getDefinitions().size()); + +document.removeDefinition(constName); +Assert.assertEquals(0, document.getDefinitions().size()); + } + + /** Tests {@link Document#addConst(String, ThriftType, ConstValue)}. */ + @Test + public void testAddConst() { +Document document = Document.emptyDocument(); +List emptyAnnotations = new ArrayList<>(); +String constName = "INT32CONSTANT"; +document.addConst( +constName, new BaseType(BaseType.Type.I32, emptyAnnotations), new ConstInteger(252)); + +String constNameActual = document.getDefinitions().get(0).getName(); + +Assert.assertEquals(constName, constNameActual); +Assert.assertTrue(document.getDefinitions().get(0) instanceof Const); + } + + /** Tests {@link
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360411=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360411 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 16/Dec/19 18:26 Start Date: 16/Dec/19 18:26 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r358392005 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/Document.java ## @@ -0,0 +1,424 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift.parser.model; + +import static java.util.Collections.emptyList; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import java.io.IOException; +import java.io.Serializable; +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Objects; +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.generic.GenericRecordBuilder; +import org.apache.avro.reflect.ReflectData; +import org.apache.beam.sdk.io.thrift.parser.visitor.DocumentVisitor; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; + +/** + * The {@link Document} class holds the elements of a Thrift file. + * + * A {@link Document} is made up of: + * + * + * {@link Header} - Contains: includes, cppIncludes, namespaces, and defaultNamespace. + * {@link Document#definitions} - Contains list of Thrift {@link Definition}. + * + */ +public class Document implements Serializable { + private Header header; + private List definitions; + + public Document(Header header, List definitions) { +this.header = checkNotNull(header, "header"); +this.definitions = ImmutableList.copyOf(checkNotNull(definitions, "definitions")); + } + + /** Returns an empty {@link Document}. */ + public static Document emptyDocument() { +List includes = emptyList(); +List cppIncludes = emptyList(); +String defaultNamespace = null; +Map namespaces = Collections.emptyMap(); +Header header = new Header(includes, cppIncludes, defaultNamespace, namespaces); +List definitions = emptyList(); +return new Document(header, definitions); + } + + public Document getDocument() { +return this; + } + + public Header getHeader() { +return this.header; + } + + public void setHeader(Header header) { +this.header = header; + } + + public List getDefinitions() { +return definitions; + } + + public void setDefinitions(List definitions) { +this.definitions = definitions; + } + + public void visit(final DocumentVisitor visitor) throws IOException { +Preconditions.checkNotNull(visitor, "the visitor must not be null!"); + +for (Definition definition : definitions) { + if (visitor.accept(definition)) { +definition.visit(visitor); + } +} + } + + /** Gets Avro {@link Schema} for the object. */ + public Schema getSchema() { +return ReflectData.get().getSchema(Document.class); + } + + /** Gets {@link Document} as a {@link GenericRecord}. */ + public GenericRecord getAsGenericRecord() { +GenericRecordBuilder genericRecordBuilder = new GenericRecordBuilder(this.getSchema()); +genericRecordBuilder.set("header", this.getHeader()).set("definitions", this.getDefinitions()); + +return genericRecordBuilder.build(); + } + + /** Adds list of includes to {@link Document#header}. */ + public void addIncludes(List includes) { +checkNotNull(includes, "includes"); +List currentIncludes = new ArrayList<>(this.getHeader().getIncludes()); +currentIncludes.addAll(includes); +
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359717=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359717 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 14/Dec/19 00:25 Start Date: 14/Dec/19 00:25 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r357879617 ## File path: sdks/java/io/thrift/src/main/antlr/DocumentGenerator.g ## @@ -0,0 +1,262 @@ +/* + * Copyright 2012 Facebook, Inc. Review comment: Yes the original file can be found [here](https://github.com/facebookarchive/swift/blob/master/swift-idl-parser/src/main/antlr3/com/facebook/swift/parser/antlr/DocumentGenerator.g). SpotlessApply seems to have removed it in some of the .java files. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 359717) Time Spent: 4h (was: 3h 50m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359716=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359716 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 14/Dec/19 00:25 Start Date: 14/Dec/19 00:25 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r357879617 ## File path: sdks/java/io/thrift/src/main/antlr/DocumentGenerator.g ## @@ -0,0 +1,262 @@ +/* + * Copyright 2012 Facebook, Inc. Review comment: Yes the original file can be found [here](https://github.com/facebookarchive/swift/blob/master/swift-idl-parser/src/main/antlr3/com/facebook/swift/parser/antlr/Thrift.g). SpotlessApply seems to have removed it in some of the .java files. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 359716) Time Spent: 3h 50m (was: 3h 40m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359715=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359715 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 14/Dec/19 00:24 Start Date: 14/Dec/19 00:24 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r357879437 ## File path: sdks/java/io/thrift/src/main/antlr/Thrift.g ## @@ -0,0 +1,290 @@ +/* + * Copyright 2008 Martin Traverso + * Copyright 2012 Facebook, Inc. Review comment: Yes the original file can be found [here](https://github.com/facebookarchive/swift/blob/master/swift-idl-parser/src/main/antlr3/com/facebook/swift/parser/antlr/Thrift.g). We've included it as part of the parser. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 359715) Time Spent: 3h 40m (was: 3.5h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359712=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359712 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 14/Dec/19 00:04 Start Date: 14/Dec/19 00:04 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r357876447 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/ConstInteger.java ## @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift.parser.model; + +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; + +public class ConstInteger extends ConstValue { Review comment: Yes this class is reused for `i64` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 359712) Time Spent: 3.5h (was: 3h 20m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359711=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359711 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 14/Dec/19 00:01 Start Date: 14/Dec/19 00:01 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r357875927 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,708 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static java.lang.String.format; +import static java.util.stream.Collectors.joining; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.Closeable; +import java.io.IOException; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import javax.annotation.Nullable; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.Compression; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser; +import org.apache.beam.sdk.io.thrift.parser.model.BaseType; +import org.apache.beam.sdk.io.thrift.parser.model.Const; +import org.apache.beam.sdk.io.thrift.parser.model.Definition; +import org.apache.beam.sdk.io.thrift.parser.model.Document; +import org.apache.beam.sdk.io.thrift.parser.model.Header; +import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField; +import org.apache.beam.sdk.io.thrift.parser.model.ListType; +import org.apache.beam.sdk.io.thrift.parser.model.MapType; +import org.apache.beam.sdk.io.thrift.parser.model.Service; +import org.apache.beam.sdk.io.thrift.parser.model.StringEnum; +import org.apache.beam.sdk.io.thrift.parser.model.Struct; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftException; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftField; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftType; +import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation; +import org.apache.beam.sdk.io.thrift.parser.model.Typedef; +import org.apache.beam.sdk.io.thrift.parser.model.VoidType; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing Thrift files. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#read} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection documents = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + *
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359709=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359709 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 14/Dec/19 00:00 Start Date: 14/Dec/19 00:00 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r357875805 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,708 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static java.lang.String.format; +import static java.util.stream.Collectors.joining; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.Closeable; +import java.io.IOException; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import javax.annotation.Nullable; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.Compression; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser; +import org.apache.beam.sdk.io.thrift.parser.model.BaseType; +import org.apache.beam.sdk.io.thrift.parser.model.Const; +import org.apache.beam.sdk.io.thrift.parser.model.Definition; +import org.apache.beam.sdk.io.thrift.parser.model.Document; +import org.apache.beam.sdk.io.thrift.parser.model.Header; +import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField; +import org.apache.beam.sdk.io.thrift.parser.model.ListType; +import org.apache.beam.sdk.io.thrift.parser.model.MapType; +import org.apache.beam.sdk.io.thrift.parser.model.Service; +import org.apache.beam.sdk.io.thrift.parser.model.StringEnum; +import org.apache.beam.sdk.io.thrift.parser.model.Struct; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftException; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftField; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftType; +import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation; +import org.apache.beam.sdk.io.thrift.parser.model.Typedef; +import org.apache.beam.sdk.io.thrift.parser.model.VoidType; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing Thrift files. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#read} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection documents = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + *
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359710=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359710 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 14/Dec/19 00:00 Start Date: 14/Dec/19 00:00 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r357875805 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,708 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static java.lang.String.format; +import static java.util.stream.Collectors.joining; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.Closeable; +import java.io.IOException; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import javax.annotation.Nullable; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.Compression; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser; +import org.apache.beam.sdk.io.thrift.parser.model.BaseType; +import org.apache.beam.sdk.io.thrift.parser.model.Const; +import org.apache.beam.sdk.io.thrift.parser.model.Definition; +import org.apache.beam.sdk.io.thrift.parser.model.Document; +import org.apache.beam.sdk.io.thrift.parser.model.Header; +import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField; +import org.apache.beam.sdk.io.thrift.parser.model.ListType; +import org.apache.beam.sdk.io.thrift.parser.model.MapType; +import org.apache.beam.sdk.io.thrift.parser.model.Service; +import org.apache.beam.sdk.io.thrift.parser.model.StringEnum; +import org.apache.beam.sdk.io.thrift.parser.model.Struct; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftException; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftField; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftType; +import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation; +import org.apache.beam.sdk.io.thrift.parser.model.Typedef; +import org.apache.beam.sdk.io.thrift.parser.model.VoidType; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing Thrift files. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#read} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection documents = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + *
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359707=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359707 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 13/Dec/19 23:54 Start Date: 13/Dec/19 23:54 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r357874789 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,708 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static java.lang.String.format; +import static java.util.stream.Collectors.joining; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.Closeable; +import java.io.IOException; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import javax.annotation.Nullable; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.Compression; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser; +import org.apache.beam.sdk.io.thrift.parser.model.BaseType; +import org.apache.beam.sdk.io.thrift.parser.model.Const; +import org.apache.beam.sdk.io.thrift.parser.model.Definition; +import org.apache.beam.sdk.io.thrift.parser.model.Document; +import org.apache.beam.sdk.io.thrift.parser.model.Header; +import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField; +import org.apache.beam.sdk.io.thrift.parser.model.ListType; +import org.apache.beam.sdk.io.thrift.parser.model.MapType; +import org.apache.beam.sdk.io.thrift.parser.model.Service; +import org.apache.beam.sdk.io.thrift.parser.model.StringEnum; +import org.apache.beam.sdk.io.thrift.parser.model.Struct; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftException; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftField; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftType; +import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation; +import org.apache.beam.sdk.io.thrift.parser.model.Typedef; +import org.apache.beam.sdk.io.thrift.parser.model.VoidType; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing Thrift files. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#read} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection documents = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + *
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359704=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359704 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 13/Dec/19 23:36 Start Date: 13/Dec/19 23:36 Worklog Time Spent: 10m Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-565650003 Thanks @gsteelman. We would love to have the sample Thrift schema and anything else we can utilize for testing! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 359704) Time Spent: 2h 40m (was: 2.5h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359703=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359703 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 13/Dec/19 23:34 Start Date: 13/Dec/19 23:34 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r357871234 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,708 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static java.lang.String.format; +import static java.util.stream.Collectors.joining; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.Closeable; +import java.io.IOException; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import javax.annotation.Nullable; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.Compression; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser; +import org.apache.beam.sdk.io.thrift.parser.model.BaseType; +import org.apache.beam.sdk.io.thrift.parser.model.Const; +import org.apache.beam.sdk.io.thrift.parser.model.Definition; +import org.apache.beam.sdk.io.thrift.parser.model.Document; +import org.apache.beam.sdk.io.thrift.parser.model.Header; +import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField; +import org.apache.beam.sdk.io.thrift.parser.model.ListType; +import org.apache.beam.sdk.io.thrift.parser.model.MapType; +import org.apache.beam.sdk.io.thrift.parser.model.Service; +import org.apache.beam.sdk.io.thrift.parser.model.StringEnum; +import org.apache.beam.sdk.io.thrift.parser.model.Struct; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftException; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftField; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftType; +import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation; +import org.apache.beam.sdk.io.thrift.parser.model.Typedef; +import org.apache.beam.sdk.io.thrift.parser.model.VoidType; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing Thrift files. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#read} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection documents = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + *
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359680=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359680 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 13/Dec/19 22:01 Start Date: 13/Dec/19 22:01 Worklog Time Spent: 10m Work Description: gsteelman commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r357798688 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,708 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static java.lang.String.format; +import static java.util.stream.Collectors.joining; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.Closeable; +import java.io.IOException; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import javax.annotation.Nullable; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.Compression; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser; +import org.apache.beam.sdk.io.thrift.parser.model.BaseType; +import org.apache.beam.sdk.io.thrift.parser.model.Const; +import org.apache.beam.sdk.io.thrift.parser.model.Definition; +import org.apache.beam.sdk.io.thrift.parser.model.Document; +import org.apache.beam.sdk.io.thrift.parser.model.Header; +import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField; +import org.apache.beam.sdk.io.thrift.parser.model.ListType; +import org.apache.beam.sdk.io.thrift.parser.model.MapType; +import org.apache.beam.sdk.io.thrift.parser.model.Service; +import org.apache.beam.sdk.io.thrift.parser.model.StringEnum; +import org.apache.beam.sdk.io.thrift.parser.model.Struct; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftException; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftField; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftType; +import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation; +import org.apache.beam.sdk.io.thrift.parser.model.Typedef; +import org.apache.beam.sdk.io.thrift.parser.model.VoidType; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing Thrift files. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#read} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection documents = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + *
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359676=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359676 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 13/Dec/19 22:01 Start Date: 13/Dec/19 22:01 Worklog Time Spent: 10m Work Description: gsteelman commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r357844088 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/ConstList.java ## @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift.parser.model; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import java.util.ArrayList; +import java.util.List; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; + +public class ConstList extends ConstValue { + private final List values; + + public ConstList(List values) { +this.values = ImmutableList.copyOf(checkNotNull(values, "values")); Review comment: Docs unclear. Does this create a deep copy or an immutable wrapper? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 359676) Time Spent: 2h 10m (was: 2h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359671=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359671 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 13/Dec/19 22:01 Start Date: 13/Dec/19 22:01 Worklog Time Spent: 10m Work Description: gsteelman commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r357798355 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,708 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static java.lang.String.format; +import static java.util.stream.Collectors.joining; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.Closeable; +import java.io.IOException; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import javax.annotation.Nullable; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.Compression; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser; +import org.apache.beam.sdk.io.thrift.parser.model.BaseType; +import org.apache.beam.sdk.io.thrift.parser.model.Const; +import org.apache.beam.sdk.io.thrift.parser.model.Definition; +import org.apache.beam.sdk.io.thrift.parser.model.Document; +import org.apache.beam.sdk.io.thrift.parser.model.Header; +import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField; +import org.apache.beam.sdk.io.thrift.parser.model.ListType; +import org.apache.beam.sdk.io.thrift.parser.model.MapType; +import org.apache.beam.sdk.io.thrift.parser.model.Service; +import org.apache.beam.sdk.io.thrift.parser.model.StringEnum; +import org.apache.beam.sdk.io.thrift.parser.model.Struct; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftException; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftField; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftType; +import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation; +import org.apache.beam.sdk.io.thrift.parser.model.Typedef; +import org.apache.beam.sdk.io.thrift.parser.model.VoidType; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing Thrift files. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#read} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection documents = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + *