[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-03-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=407268=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-407268
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 20/Mar/20 21:27
Start Date: 20/Mar/20 21:27
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-601918008
 
 
   We let it like this because we had a regression, if we want to generate and 
compile the class we need to have thrift installed in all Beam workers (and as 
a requirement for everyone building Beam) which I think is clearly overklll 
just for a test.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 407268)
Time Spent: 17.5h  (was: 17h 20m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 17.5h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-03-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=406304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-406304
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 19/Mar/20 15:03
Start Date: 19/Mar/20 15:03
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-601229916
 
 
   We ended up including the generated `TestThriftStruct` class instead of 
trying to generate at compile time. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 406304)
Time Spent: 17h 20m  (was: 17h 10m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 17h 20m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=405241=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405241
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 18/Mar/20 06:29
Start Date: 18/Mar/20 06:29
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-600448077
 
 
   I'm sorry about the confusion. What happened with this after all?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 405241)
Time Spent: 17h 10m  (was: 17h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=389070=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389070
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 18/Feb/20 20:41
Start Date: 18/Feb/20 20:41
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-587811830
 
 
   Thanks ping me when you open the PR to get it merged eagerly.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389070)
Time Spent: 17h  (was: 16h 50m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 17h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=389069=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389069
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 18/Feb/20 20:40
Start Date: 18/Feb/20 20:40
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-587810419
 
 
   @iemejia yes I'll re-add the generated class.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389069)
Time Spent: 16h 50m  (was: 16h 40m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 16h 50m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=389068=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389068
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 18/Feb/20 20:38
Start Date: 18/Feb/20 20:38
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-587807299
 
 
   Arrghh it seems we broke master because the workers don't have thrift 
installed, can we just go back to the generated class and comment the compiler 
step so we can unblock everyone.
   Can you help me with this @chrlarsen or @pabloem (I am not in my computer).
   For info 
https://lists.apache.org/thread.html/r8b2c65bfccc5a68796811804903d4ca08827359de2399a94b6d18197%40%3Cdev.beam.apache.org%3E
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389068)
Time Spent: 16h 40m  (was: 16.5h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 16h 40m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=388713=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-388713
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 18/Feb/20 08:41
Start Date: 18/Feb/20 08:41
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 388713)
Time Spent: 16.5h  (was: 16h 20m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 16.5h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=388712=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-388712
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 18/Feb/20 08:41
Start Date: 18/Feb/20 08:41
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-587343778
 
 
   Merged manually to squash the commits. Thanks @chrlarsen ! 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 388712)
Time Spent: 16h 20m  (was: 16h 10m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 16h 20m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=388689=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-388689
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 18/Feb/20 07:05
Start Date: 18/Feb/20 07:05
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-587312699
 
 
   aw this is great. Thanks! LGTM. I'll let Ismael / Steve add their comments, 
and merge if nothing else.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 388689)
Time Spent: 16h 10m  (was: 16h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 16h 10m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=387066=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387066
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:42
Start Date: 14/Feb/20 02:42
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-586070601
 
 
   @pabloem @steveniemitz @iemejia comments have been addressed and updated. 
Also I will add `inferBeamSchema` to the future plans :) 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387066)
Time Spent: 16h  (was: 15h 50m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 16h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=386979=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386979
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:11
Start Date: 14/Feb/20 00:11
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r379189609
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * PCollection examples = 
files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto);
+ * }
+ *
+ * Writing Thrift Files
+ *
+ * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} 
to be written to
+ * Thrift files. It can be used with the general-purpose {@link FileIO} 
transforms with
+ * FileIO.write/writeDynamic specifically.
+ *
+ * For example:
+ *
+ * {@code
+ * pipeline
+ *   .apply(...) // PCollection
+ *   .apply(FileIO
+ * .write()
+ * .via(ThriftIO.sink(thriftProto))
+ * .to("destination/path");
+ * }
+ *
+ * This IO API is considered experimental and may break or receive 
backwards-incompatible changes
+ * in future versions of the Apache Beam SDK.
+ */
+@Experimental(Experimental.Kind.SOURCE_SINK)
+public class ThriftIO {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class);
+
+  /** Disable construction of utility class. */
+  private ThriftIO() {}
+
+  /**
+   * Reads each file in a {@link PCollection} of {@link 
org.apache.beam.sdk.io.FileIO.ReadableFile},
+   * which allows more flexible usage.
+   */
+  public static  ReadFiles readFiles(Class recordClass) {
+return new 
AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build();
+  }
+
+  
//
+
+  /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link 
FileIO#writeDynamic}. */
+  public static > Sink 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=386331=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386331
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Feb/20 00:54
Start Date: 13/Feb/20 00:54
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on pull request #10290: 
[BEAM-8561] Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r378597769
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * PCollection examples = 
files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto);
+ * }
+ *
+ * Writing Thrift Files
+ *
+ * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} 
to be written to
+ * Thrift files. It can be used with the general-purpose {@link FileIO} 
transforms with
+ * FileIO.write/writeDynamic specifically.
+ *
+ * For example:
+ *
+ * {@code
+ * pipeline
+ *   .apply(...) // PCollection
+ *   .apply(FileIO
+ * .write()
+ * .via(ThriftIO.sink(thriftProto))
+ * .to("destination/path");
+ * }
+ *
+ * This IO API is considered experimental and may break or receive 
backwards-incompatible changes
+ * in future versions of the Apache Beam SDK.
+ */
+@Experimental(Experimental.Kind.SOURCE_SINK)
+public class ThriftIO {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class);
+
+  /** Disable construction of utility class. */
+  private ThriftIO() {}
+
+  /**
+   * Reads each file in a {@link PCollection} of {@link 
org.apache.beam.sdk.io.FileIO.ReadableFile},
+   * which allows more flexible usage.
+   */
+  public static  ReadFiles readFiles(Class recordClass) {
+return new 
AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build();
+  }
+
+  
//
+
+  /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link 
FileIO#writeDynamic}. */
+  public static > Sink 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=386327=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386327
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Feb/20 00:44
Start Date: 13/Feb/20 00:44
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r378594519
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * PCollection examples = 
files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto);
+ * }
+ *
+ * Writing Thrift Files
+ *
+ * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} 
to be written to
+ * Thrift files. It can be used with the general-purpose {@link FileIO} 
transforms with
+ * FileIO.write/writeDynamic specifically.
+ *
+ * For example:
+ *
+ * {@code
+ * pipeline
+ *   .apply(...) // PCollection
+ *   .apply(FileIO
+ * .write()
+ * .via(ThriftIO.sink(thriftProto))
+ * .to("destination/path");
+ * }
+ *
+ * This IO API is considered experimental and may break or receive 
backwards-incompatible changes
+ * in future versions of the Apache Beam SDK.
+ */
+@Experimental(Experimental.Kind.SOURCE_SINK)
+public class ThriftIO {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class);
+
+  /** Disable construction of utility class. */
+  private ThriftIO() {}
+
+  /**
+   * Reads each file in a {@link PCollection} of {@link 
org.apache.beam.sdk.io.FileIO.ReadableFile},
+   * which allows more flexible usage.
+   */
+  public static  ReadFiles readFiles(Class recordClass) {
+return new 
AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build();
+  }
+
+  
//
+
+  /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link 
FileIO#writeDynamic}. */
+  public static > Sink 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=386321=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386321
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Feb/20 00:16
Start Date: 13/Feb/20 00:16
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r378586352
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/package-info.java
 ##
 @@ -0,0 +1,20 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/** Transforms for reading and writing to Thrift files. */
+package org.apache.beam.sdk.io.thrift;
 
 Review comment:
   Done updated.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386321)
Time Spent: 15h 20m  (was: 15h 10m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 15h 20m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=386314=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386314
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 12/Feb/20 23:58
Start Date: 12/Feb/20 23:58
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r378581158
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.io.OutputStream;
+import org.apache.beam.sdk.coders.CoderException;
+import org.apache.beam.sdk.coders.CustomCoder;
+
+public class ThriftCoder extends CustomCoder {
+
+  public static  ThriftCoder of() {
+return new ThriftCoder<>();
+  }
+
+  /**
+   * Encodes the given value of type {@code T} onto the given output stream.
+   *
+   * @param value {@link org.apache.thrift.TBase} to encode.
+   * @param outStream stream to output encoded value to.
+   * @throws IOException if writing to the {@code OutputStream} fails for some 
reason
+   * @throws CoderException if the value could not be encoded for some reason
+   */
+  @Override
+  public void encode(T value, OutputStream outStream) throws CoderException, 
IOException {
+ObjectOutputStream oos = new ObjectOutputStream(outStream);
+oos.writeObject(value);
+oos.flush();
+  }
 
 Review comment:
   Sounds good, the next commit will include Thrift native serialization.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386314)
Time Spent: 15h 10m  (was: 15h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 15h 10m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=386089=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386089
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 12/Feb/20 17:34
Start Date: 12/Feb/20 17:34
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r378404834
 
 

 ##
 File path: 
sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftStruct.java
 ##
 @@ -0,0 +1,1232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
 
 Review comment:
   Sounds good, I'll work on this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386089)
Time Spent: 15h  (was: 14h 50m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 15h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=386003=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386003
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 12/Feb/20 15:49
Start Date: 12/Feb/20 15:49
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-585267600
 
 
   One extra thing we just merged a plugin to auto label PRs for different 
components/extensions/ios, can you please also add the label + path for thrift 
in this file:
   https://github.com/apache/beam/blob/master/.github/autolabeler.yml
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386003)
Time Spent: 14h 50m  (was: 14h 40m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 14h 50m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385999=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385999
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 12/Feb/20 15:43
Start Date: 12/Feb/20 15:43
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-585267600
 
 
   One extra thing we just merged plugin to auto label PRs for different 
components/extensions/ios, can you please add the label + path for thrift in 
this file:
   https://github.com/apache/beam/blob/master/.github/autolabeler.yml
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385999)
Time Spent: 14.5h  (was: 14h 20m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=386000=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386000
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 12/Feb/20 15:43
Start Date: 12/Feb/20 15:43
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-585267600
 
 
   One extra thing we just merged a plugin to auto label PRs for different 
components/extensions/ios, can you please add the label + path for thrift in 
this file:
   https://github.com/apache/beam/blob/master/.github/autolabeler.yml
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386000)
Time Spent: 14h 40m  (was: 14.5h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 14h 40m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385991=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385991
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 12/Feb/20 15:39
Start Date: 12/Feb/20 15:39
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on pull request #10290: 
[BEAM-8561] Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r378331613
 
 

 ##
 File path: 
sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftStruct.java
 ##
 @@ -0,0 +1,1232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
 
 Review comment:
   I don't think that thrift guarantees that generated code is forwards or 
backwards compatible, meaning that if we upgrade libthrift here, the code might 
no longer compile and need to be regenerated.  I'm +1 for having this being 
generated at compile time.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385991)
Time Spent: 14h 20m  (was: 14h 10m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 14h 20m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385490=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385490
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 22:00
Start Date: 11/Feb/20 22:00
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377925605
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * PCollection examples = 
files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto);
+ * }
+ *
+ * Writing Thrift Files
+ *
+ * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} 
to be written to
+ * Thrift files. It can be used with the general-purpose {@link FileIO} 
transforms with
+ * FileIO.write/writeDynamic specifically.
+ *
+ * For example:
+ *
+ * {@code
+ * pipeline
+ *   .apply(...) // PCollection
+ *   .apply(FileIO
+ * .write()
+ * .via(ThriftIO.sink(thriftProto))
+ * .to("destination/path");
+ * }
+ *
+ * This IO API is considered experimental and may break or receive 
backwards-incompatible changes
+ * in future versions of the Apache Beam SDK.
+ */
+@Experimental(Experimental.Kind.SOURCE_SINK)
+public class ThriftIO {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class);
+
+  /** Disable construction of utility class. */
+  private ThriftIO() {}
+
+  /**
+   * Reads each file in a {@link PCollection} of {@link 
org.apache.beam.sdk.io.FileIO.ReadableFile},
+   * which allows more flexible usage.
+   */
+  public static  ReadFiles readFiles(Class recordClass) {
+return new 
AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build();
+  }
+
+  
//
+
+  /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link 
FileIO#writeDynamic}. */
+  public static > Sink 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385488=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385488
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:59
Start Date: 11/Feb/20 21:59
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377925182
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * PCollection examples = 
files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto);
+ * }
+ *
+ * Writing Thrift Files
+ *
+ * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} 
to be written to
+ * Thrift files. It can be used with the general-purpose {@link FileIO} 
transforms with
+ * FileIO.write/writeDynamic specifically.
+ *
+ * For example:
+ *
+ * {@code
+ * pipeline
+ *   .apply(...) // PCollection
+ *   .apply(FileIO
+ * .write()
+ * .via(ThriftIO.sink(thriftProto))
+ * .to("destination/path");
+ * }
+ *
+ * This IO API is considered experimental and may break or receive 
backwards-incompatible changes
+ * in future versions of the Apache Beam SDK.
+ */
+@Experimental(Experimental.Kind.SOURCE_SINK)
+public class ThriftIO {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class);
+
+  /** Disable construction of utility class. */
+  private ThriftIO() {}
+
+  /**
+   * Reads each file in a {@link PCollection} of {@link 
org.apache.beam.sdk.io.FileIO.ReadableFile},
+   * which allows more flexible usage.
+   */
+  public static  ReadFiles readFiles(Class recordClass) {
+return new 
AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build();
+  }
+
+  
//
+
+  /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link 
FileIO#writeDynamic}. */
+  public static > Sink 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385489=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385489
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:59
Start Date: 11/Feb/20 21:59
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377925387
 
 

 ##
 File path: 
sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/ThriftIOTest.java
 ##
 @@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import 
org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomStringUtils;
+import org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomUtils;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.values.PCollection;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.Resources;
+import org.apache.thrift.protocol.TBinaryProtocol;
+import org.apache.thrift.protocol.TCompactProtocol;
+import org.apache.thrift.protocol.TJSONProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.protocol.TSimpleJSONProtocol;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link ThriftIO}. */
+@RunWith(JUnit4.class)
+public class ThriftIOTest implements Serializable {
+
+  private static final String RESOURCE_DIR = "ThriftIOTest/";
+
+  private static final String THRIFT_DIR = 
Resources.getResource(RESOURCE_DIR).getPath();
+  private static final String ALL_THRIFT_STRING =
+  Resources.getResource(RESOURCE_DIR).getPath() + "*";
+  private static final TestThriftStruct TEST_THRIFT_STRUCT = new 
TestThriftStruct();
+  private static List testThriftStructs;
+  private final TProtocolFactory tBinaryProtoFactory = new 
TBinaryProtocol.Factory();
+  private final TProtocolFactory tJsonProtocolFactory = new 
TJSONProtocol.Factory();
+  private final TProtocolFactory tSimpleJsonProtocolFactory = new 
TSimpleJSONProtocol.Factory();
+  private final TProtocolFactory tCompactProtocolFactory = new 
TCompactProtocol.Factory();
+  @Rule public transient TestPipeline mainPipeline = TestPipeline.create();
+  @Rule public transient TestPipeline readPipeline = TestPipeline.create();
+  @Rule public transient TestPipeline writePipeline = TestPipeline.create();
+  @Rule public transient TemporaryFolder temporaryFolder = new 
TemporaryFolder();
+
+  @Before
+  public void setUp() throws Exception {
+byte[] bytes = new byte[10];
+ByteBuffer buffer = ByteBuffer.wrap(bytes);
+
+TEST_THRIFT_STRUCT.testByte = 100;
+TEST_THRIFT_STRUCT.testShort = 200;
+TEST_THRIFT_STRUCT.testInt = 2500;
+TEST_THRIFT_STRUCT.testLong = 79303L;
+TEST_THRIFT_STRUCT.testDouble = 25.007;
+TEST_THRIFT_STRUCT.testBool = true;
+TEST_THRIFT_STRUCT.stringIntMap = new HashMap<>();
+TEST_THRIFT_STRUCT.stringIntMap.put("first", (short) 1);
+TEST_THRIFT_STRUCT.stringIntMap.put("second", (short) 2);
+TEST_THRIFT_STRUCT.testBinary = buffer;
+
+testThriftStructs = ImmutableList.copyOf(generateTestObjects(1000L));
+  }
+
+  /** Tests {@link ThriftIO#readFiles(Class)} with {@link TBinaryProtocol}. */
+  @Test
+  public void testReadFilesBinaryProtocol() {
+
+PCollection testThriftDoc =
+mainPipeline
+.apply(Create.of(THRIFT_DIR + 
"data").withCoder(StringUtf8Coder.of()))
+

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385484=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385484
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:53
Start Date: 11/Feb/20 21:53
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377922006
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.io.OutputStream;
+import org.apache.beam.sdk.coders.CoderException;
+import org.apache.beam.sdk.coders.CustomCoder;
+
+public class ThriftCoder extends CustomCoder {
 
 Review comment:
   Done updated
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385484)
Time Spent: 13h 40m  (was: 13.5h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 13h 40m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385482=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385482
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:51
Start Date: 11/Feb/20 21:51
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377921210
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
 
 Review comment:
   Sounds good, I'll remove references to `read()`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385482)
Time Spent: 13.5h  (was: 13h 20m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385478=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385478
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:45
Start Date: 11/Feb/20 21:45
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377918220
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.io.OutputStream;
+import org.apache.beam.sdk.coders.CoderException;
+import org.apache.beam.sdk.coders.CustomCoder;
+
+public class ThriftCoder extends CustomCoder {
+
+  public static  ThriftCoder of() {
 
 Review comment:
   Done, removed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385478)
Time Spent: 13h 20m  (was: 13h 10m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385476=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385476
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:41
Start Date: 11/Feb/20 21:41
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377916026
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * PCollection examples = 
files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto);
+ * }
+ *
+ * Writing Thrift Files
+ *
+ * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} 
to be written to
+ * Thrift files. It can be used with the general-purpose {@link FileIO} 
transforms with
+ * FileIO.write/writeDynamic specifically.
+ *
+ * For example:
+ *
+ * {@code
+ * pipeline
+ *   .apply(...) // PCollection
+ *   .apply(FileIO
+ * .write()
+ * .via(ThriftIO.sink(thriftProto))
+ * .to("destination/path");
+ * }
+ *
+ * This IO API is considered experimental and may break or receive 
backwards-incompatible changes
+ * in future versions of the Apache Beam SDK.
+ */
+@Experimental(Experimental.Kind.SOURCE_SINK)
 
 Review comment:
   Done updated
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385476)
Time Spent: 13h 10m  (was: 13h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385475=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385475
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:38
Start Date: 11/Feb/20 21:38
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377914782
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
 
 Review comment:
   +1 to not have a `read()` , less 'useless' code to maintain, other File 
based IOs only have it for historical reasons and we decided to deprecate 
`readAll` transforms too to make FileIO.match + read composition more explicit 
since it cover more cases.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385475)
Time Spent: 13h  (was: 12h 50m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385474=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385474
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:33
Start Date: 11/Feb/20 21:33
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377912388
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
 
 Review comment:
   Correct `read()` is not implemented and I will remove references to it 
unless we think it should be implemented. I think `readFiles()` will cover 
everything but the simple use case. What are your thoughts?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385474)
Time Spent: 12h 50m  (was: 12h 40m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 12h 50m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385473=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385473
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:29
Start Date: 11/Feb/20 21:29
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377910032
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
 
 Review comment:
   This reference will be removed. `readFiles()` will take in the class as it 
is needed for the pipeline to deserialize the data into. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385473)
Time Spent: 12h 40m  (was: 12.5h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 12h 40m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385469=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385469
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:27
Start Date: 11/Feb/20 21:27
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377909310
 
 

 ##
 File path: 
sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/ThriftIOTest.java
 ##
 @@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import 
org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomStringUtils;
+import org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomUtils;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.values.PCollection;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.Resources;
+import org.apache.thrift.protocol.TBinaryProtocol;
+import org.apache.thrift.protocol.TCompactProtocol;
+import org.apache.thrift.protocol.TJSONProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.protocol.TSimpleJSONProtocol;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link ThriftIO}. */
+@RunWith(JUnit4.class)
+public class ThriftIOTest implements Serializable {
+
+  private static final String RESOURCE_DIR = "ThriftIOTest/";
+
+  private static final String THRIFT_DIR = 
Resources.getResource(RESOURCE_DIR).getPath();
+  private static final String ALL_THRIFT_STRING =
+  Resources.getResource(RESOURCE_DIR).getPath() + "*";
+  private static final TestThriftStruct TEST_THRIFT_STRUCT = new 
TestThriftStruct();
+  private static List testThriftStructs;
+  private final TProtocolFactory tBinaryProtoFactory = new 
TBinaryProtocol.Factory();
+  private final TProtocolFactory tJsonProtocolFactory = new 
TJSONProtocol.Factory();
+  private final TProtocolFactory tSimpleJsonProtocolFactory = new 
TSimpleJSONProtocol.Factory();
+  private final TProtocolFactory tCompactProtocolFactory = new 
TCompactProtocol.Factory();
+  @Rule public transient TestPipeline mainPipeline = TestPipeline.create();
+  @Rule public transient TestPipeline readPipeline = TestPipeline.create();
+  @Rule public transient TestPipeline writePipeline = TestPipeline.create();
+  @Rule public transient TemporaryFolder temporaryFolder = new 
TemporaryFolder();
+
+  @Before
+  public void setUp() throws Exception {
+byte[] bytes = new byte[10];
+ByteBuffer buffer = ByteBuffer.wrap(bytes);
+
+TEST_THRIFT_STRUCT.testByte = 100;
+TEST_THRIFT_STRUCT.testShort = 200;
+TEST_THRIFT_STRUCT.testInt = 2500;
+TEST_THRIFT_STRUCT.testLong = 79303L;
+TEST_THRIFT_STRUCT.testDouble = 25.007;
+TEST_THRIFT_STRUCT.testBool = true;
+TEST_THRIFT_STRUCT.stringIntMap = new HashMap<>();
+TEST_THRIFT_STRUCT.stringIntMap.put("first", (short) 1);
+TEST_THRIFT_STRUCT.stringIntMap.put("second", (short) 2);
+TEST_THRIFT_STRUCT.testBinary = buffer;
+
+testThriftStructs = ImmutableList.copyOf(generateTestObjects(1000L));
+  }
+
+  /** Tests {@link ThriftIO#readFiles(Class)} with {@link TBinaryProtocol}. */
+  @Test
 
 Review comment:
   `read` was in the old implementation and I will remove the references to it. 
I think that `readFiles()` will cover most use cases for this IO. 
 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385462
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:20
Start Date: 11/Feb/20 21:20
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377901030
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * PCollection examples = 
files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto);
+ * }
+ *
+ * Writing Thrift Files
+ *
+ * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} 
to be written to
+ * Thrift files. It can be used with the general-purpose {@link FileIO} 
transforms with
+ * FileIO.write/writeDynamic specifically.
+ *
+ * For example:
+ *
+ * {@code
+ * pipeline
+ *   .apply(...) // PCollection
+ *   .apply(FileIO
+ * .write()
+ * .via(ThriftIO.sink(thriftProto))
+ * .to("destination/path");
+ * }
+ *
+ * This IO API is considered experimental and may break or receive 
backwards-incompatible changes
+ * in future versions of the Apache Beam SDK.
+ */
+@Experimental(Experimental.Kind.SOURCE_SINK)
+public class ThriftIO {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class);
+
+  /** Disable construction of utility class. */
+  private ThriftIO() {}
+
+  /**
+   * Reads each file in a {@link PCollection} of {@link 
org.apache.beam.sdk.io.FileIO.ReadableFile},
+   * which allows more flexible usage.
+   */
+  public static  ReadFiles readFiles(Class recordClass) {
+return new 
AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build();
+  }
+
+  
//
+
+  /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link 
FileIO#writeDynamic}. */
+  public static > Sink 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385464=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385464
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:20
Start Date: 11/Feb/20 21:20
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377901859
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * PCollection examples = 
files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto);
+ * }
+ *
+ * Writing Thrift Files
+ *
+ * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} 
to be written to
+ * Thrift files. It can be used with the general-purpose {@link FileIO} 
transforms with
+ * FileIO.write/writeDynamic specifically.
+ *
+ * For example:
+ *
+ * {@code
+ * pipeline
+ *   .apply(...) // PCollection
+ *   .apply(FileIO
+ * .write()
+ * .via(ThriftIO.sink(thriftProto))
+ * .to("destination/path");
+ * }
+ *
+ * This IO API is considered experimental and may break or receive 
backwards-incompatible changes
+ * in future versions of the Apache Beam SDK.
+ */
+@Experimental(Experimental.Kind.SOURCE_SINK)
+public class ThriftIO {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class);
+
+  /** Disable construction of utility class. */
+  private ThriftIO() {}
+
+  /**
+   * Reads each file in a {@link PCollection} of {@link 
org.apache.beam.sdk.io.FileIO.ReadableFile},
+   * which allows more flexible usage.
+   */
+  public static  ReadFiles readFiles(Class recordClass) {
+return new 
AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build();
+  }
+
+  
//
+
+  /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link 
FileIO#writeDynamic}. */
+  public static > Sink 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385465=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385465
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:20
Start Date: 11/Feb/20 21:20
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377895151
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * PCollection examples = 
files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto);
+ * }
+ *
+ * Writing Thrift Files
+ *
+ * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} 
to be written to
+ * Thrift files. It can be used with the general-purpose {@link FileIO} 
transforms with
+ * FileIO.write/writeDynamic specifically.
+ *
+ * For example:
+ *
+ * {@code
+ * pipeline
+ *   .apply(...) // PCollection
+ *   .apply(FileIO
+ * .write()
+ * .via(ThriftIO.sink(thriftProto))
+ * .to("destination/path");
+ * }
+ *
+ * This IO API is considered experimental and may break or receive 
backwards-incompatible changes
+ * in future versions of the Apache Beam SDK.
+ */
+@Experimental(Experimental.Kind.SOURCE_SINK)
+public class ThriftIO {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class);
+
+  /** Disable construction of utility class. */
+  private ThriftIO() {}
+
+  /**
+   * Reads each file in a {@link PCollection} of {@link 
org.apache.beam.sdk.io.FileIO.ReadableFile},
+   * which allows more flexible usage.
+   */
+  public static  ReadFiles readFiles(Class recordClass) {
+return new 
AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build();
+  }
+
+  
//
+
+  /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link 
FileIO#writeDynamic}. */
+  public static > Sink 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385459=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385459
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:20
Start Date: 11/Feb/20 21:20
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377886065
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/package-info.java
 ##
 @@ -0,0 +1,20 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/** Transforms for reading and writing to Thrift files. */
+package org.apache.beam.sdk.io.thrift;
 
 Review comment:
   Add `@Experimental(Kind.SOURCE_SINK)` at the package level too
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385459)
Time Spent: 11h 40m  (was: 11.5h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385467=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385467
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:20
Start Date: 11/Feb/20 21:20
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377903969
 
 

 ##
 File path: 
sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/ThriftIOTest.java
 ##
 @@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import 
org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomStringUtils;
+import org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomUtils;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.values.PCollection;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.Resources;
+import org.apache.thrift.protocol.TBinaryProtocol;
+import org.apache.thrift.protocol.TCompactProtocol;
+import org.apache.thrift.protocol.TJSONProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.protocol.TSimpleJSONProtocol;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link ThriftIO}. */
+@RunWith(JUnit4.class)
+public class ThriftIOTest implements Serializable {
+
+  private static final String RESOURCE_DIR = "ThriftIOTest/";
+
+  private static final String THRIFT_DIR = 
Resources.getResource(RESOURCE_DIR).getPath();
+  private static final String ALL_THRIFT_STRING =
+  Resources.getResource(RESOURCE_DIR).getPath() + "*";
+  private static final TestThriftStruct TEST_THRIFT_STRUCT = new 
TestThriftStruct();
+  private static List testThriftStructs;
+  private final TProtocolFactory tBinaryProtoFactory = new 
TBinaryProtocol.Factory();
+  private final TProtocolFactory tJsonProtocolFactory = new 
TJSONProtocol.Factory();
+  private final TProtocolFactory tSimpleJsonProtocolFactory = new 
TSimpleJSONProtocol.Factory();
+  private final TProtocolFactory tCompactProtocolFactory = new 
TCompactProtocol.Factory();
+  @Rule public transient TestPipeline mainPipeline = TestPipeline.create();
+  @Rule public transient TestPipeline readPipeline = TestPipeline.create();
+  @Rule public transient TestPipeline writePipeline = TestPipeline.create();
+  @Rule public transient TemporaryFolder temporaryFolder = new 
TemporaryFolder();
+
+  @Before
+  public void setUp() throws Exception {
+byte[] bytes = new byte[10];
+ByteBuffer buffer = ByteBuffer.wrap(bytes);
+
+TEST_THRIFT_STRUCT.testByte = 100;
+TEST_THRIFT_STRUCT.testShort = 200;
+TEST_THRIFT_STRUCT.testInt = 2500;
+TEST_THRIFT_STRUCT.testLong = 79303L;
+TEST_THRIFT_STRUCT.testDouble = 25.007;
+TEST_THRIFT_STRUCT.testBool = true;
+TEST_THRIFT_STRUCT.stringIntMap = new HashMap<>();
+TEST_THRIFT_STRUCT.stringIntMap.put("first", (short) 1);
+TEST_THRIFT_STRUCT.stringIntMap.put("second", (short) 2);
+TEST_THRIFT_STRUCT.testBinary = buffer;
+
+testThriftStructs = ImmutableList.copyOf(generateTestObjects(1000L));
+  }
+
+  /** Tests {@link ThriftIO#readFiles(Class)} with {@link TBinaryProtocol}. */
+  @Test
+  public void testReadFilesBinaryProtocol() {
+
+PCollection testThriftDoc =
+mainPipeline
+.apply(Create.of(THRIFT_DIR + 
"data").withCoder(StringUtf8Coder.of()))
+

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385455=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385455
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:20
Start Date: 11/Feb/20 21:20
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377905535
 
 

 ##
 File path: 
sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftStruct.java
 ##
 @@ -0,0 +1,1232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
 
 Review comment:
   Hm in that case, I think it's fine to commit the generated file.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385455)
Time Spent: 11h 10m  (was: 11h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385466=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385466
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:20
Start Date: 11/Feb/20 21:20
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377900552
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * PCollection examples = 
files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto);
+ * }
+ *
+ * Writing Thrift Files
+ *
+ * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} 
to be written to
+ * Thrift files. It can be used with the general-purpose {@link FileIO} 
transforms with
+ * FileIO.write/writeDynamic specifically.
+ *
+ * For example:
+ *
+ * {@code
+ * pipeline
+ *   .apply(...) // PCollection
+ *   .apply(FileIO
+ * .write()
+ * .via(ThriftIO.sink(thriftProto))
+ * .to("destination/path");
+ * }
+ *
+ * This IO API is considered experimental and may break or receive 
backwards-incompatible changes
+ * in future versions of the Apache Beam SDK.
+ */
+@Experimental(Experimental.Kind.SOURCE_SINK)
+public class ThriftIO {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class);
+
+  /** Disable construction of utility class. */
+  private ThriftIO() {}
+
+  /**
+   * Reads each file in a {@link PCollection} of {@link 
org.apache.beam.sdk.io.FileIO.ReadableFile},
+   * which allows more flexible usage.
+   */
+  public static  ReadFiles readFiles(Class recordClass) {
+return new 
AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build();
+  }
+
+  
//
+
+  /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link 
FileIO#writeDynamic}. */
+  public static > Sink 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385460=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385460
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:20
Start Date: 11/Feb/20 21:20
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377894156
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.io.OutputStream;
+import org.apache.beam.sdk.coders.CoderException;
+import org.apache.beam.sdk.coders.CustomCoder;
+
+public class ThriftCoder extends CustomCoder {
+
+  public static  ThriftCoder of() {
 
 Review comment:
   remove public
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385460)
Time Spent: 11h 50m  (was: 11h 40m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385457=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385457
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:20
Start Date: 11/Feb/20 21:20
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377905535
 
 

 ##
 File path: 
sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftStruct.java
 ##
 @@ -0,0 +1,1232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
 
 Review comment:
   Hm in that case, I think it's fine to commit the generated file - unless you 
feel up to adding the gradle config : )
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385457)
Time Spent: 11h 20m  (was: 11h 10m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385463=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385463
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:20
Start Date: 11/Feb/20 21:20
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377894855
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
 
 Review comment:
   Having a read() is not mandatory if the example uses FileIO.match and 
friends IMO.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385463)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385458=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385458
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:20
Start Date: 11/Feb/20 21:20
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377885914
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * PCollection examples = 
files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto);
+ * }
+ *
+ * Writing Thrift Files
+ *
+ * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} 
to be written to
+ * Thrift files. It can be used with the general-purpose {@link FileIO} 
transforms with
+ * FileIO.write/writeDynamic specifically.
+ *
+ * For example:
+ *
+ * {@code
+ * pipeline
+ *   .apply(...) // PCollection
+ *   .apply(FileIO
+ * .write()
+ * .via(ThriftIO.sink(thriftProto))
+ * .to("destination/path");
+ * }
+ *
+ * This IO API is considered experimental and may break or receive 
backwards-incompatible changes
+ * in future versions of the Apache Beam SDK.
+ */
+@Experimental(Experimental.Kind.SOURCE_SINK)
 
 Review comment:
   `@Experimental(Kind.SOURCE_SINK)` to make it consistent with the rest of the 
code base
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385458)
Time Spent: 11.5h  (was: 11h 20m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385461=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385461
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:20
Start Date: 11/Feb/20 21:20
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377885479
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.io.OutputStream;
+import org.apache.beam.sdk.coders.CoderException;
+import org.apache.beam.sdk.coders.CustomCoder;
+
+public class ThriftCoder extends CustomCoder {
 
 Review comment:
   This is in principle internal, so maybe make it package protected.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385461)
Time Spent: 11h 50m  (was: 11h 40m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385453
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 21:14
Start Date: 11/Feb/20 21:14
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377902208
 
 

 ##
 File path: 
sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftStruct.java
 ##
 @@ -0,0 +1,1232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
 
 Review comment:
   It can be generated from the .thrift file that is included. I think we would 
need to add a thrift compiler to the build.gradle to compile it for testing, 
thoughts?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385453)
Time Spent: 11h  (was: 10h 50m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 11h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385431
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 20:37
Start Date: 11/Feb/20 20:37
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377883979
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.io.OutputStream;
+import org.apache.beam.sdk.coders.CoderException;
+import org.apache.beam.sdk.coders.CustomCoder;
+
+public class ThriftCoder extends CustomCoder {
+
+  public static  ThriftCoder of() {
+return new ThriftCoder<>();
+  }
+
+  /**
+   * Encodes the given value of type {@code T} onto the given output stream.
+   *
+   * @param value {@link org.apache.thrift.TBase} to encode.
+   * @param outStream stream to output encoded value to.
+   * @throws IOException if writing to the {@code OutputStream} fails for some 
reason
+   * @throws CoderException if the value could not be encoded for some reason
+   */
+  @Override
+  public void encode(T value, OutputStream outStream) throws CoderException, 
IOException {
+ObjectOutputStream oos = new ObjectOutputStream(outStream);
+oos.writeObject(value);
+oos.flush();
+  }
 
 Review comment:
   +1 to use Thrift native serializaton this will enable to share the data with 
cross-language pipelines in the future
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385431)
Time Spent: 10h 50m  (was: 10h 40m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385422=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385422
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 20:03
Start Date: 11/Feb/20 20:03
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on pull request #10290: 
[BEAM-8561] Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377868235
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.io.OutputStream;
+import org.apache.beam.sdk.coders.CoderException;
+import org.apache.beam.sdk.coders.CustomCoder;
+
+public class ThriftCoder extends CustomCoder {
+
+  public static  ThriftCoder of() {
+return new ThriftCoder<>();
+  }
+
+  /**
+   * Encodes the given value of type {@code T} onto the given output stream.
+   *
+   * @param value {@link org.apache.thrift.TBase} to encode.
+   * @param outStream stream to output encoded value to.
+   * @throws IOException if writing to the {@code OutputStream} fails for some 
reason
+   * @throws CoderException if the value could not be encoded for some reason
+   */
+  @Override
+  public void encode(T value, OutputStream outStream) throws CoderException, 
IOException {
+ObjectOutputStream oos = new ObjectOutputStream(outStream);
+oos.writeObject(value);
+oos.flush();
+  }
 
 Review comment:
   fwiw the java thrift classes will use the TCompactProtocol to serialize 
themselves when being java serialized.
   
   Personally I would rather see a coder here that explicitly uses a TProtocol 
to serialize the object rather than relying on java serialization to do it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385422)
Time Spent: 10h 40m  (was: 10.5h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385407=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385407
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 19:51
Start Date: 11/Feb/20 19:51
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377861986
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.io.OutputStream;
+import org.apache.beam.sdk.coders.CoderException;
+import org.apache.beam.sdk.coders.CustomCoder;
+
+public class ThriftCoder extends CustomCoder {
+
+  public static  ThriftCoder of() {
+return new ThriftCoder<>();
+  }
+
+  /**
+   * Encodes the given value of type {@code T} onto the given output stream.
+   *
+   * @param value {@link org.apache.thrift.TBase} to encode.
+   * @param outStream stream to output encoded value to.
+   * @throws IOException if writing to the {@code OutputStream} fails for some 
reason
+   * @throws CoderException if the value could not be encoded for some reason
+   */
+  @Override
+  public void encode(T value, OutputStream outStream) throws CoderException, 
IOException {
+ObjectOutputStream oos = new ObjectOutputStream(outStream);
+oos.writeObject(value);
+oos.flush();
+  }
 
 Review comment:
   Yes exactly we'll read the records as thrift-serialized then Java serialize 
within the pipeline. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385407)
Time Spent: 10.5h  (was: 10h 20m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=384985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-384985
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 06:08
Start Date: 11/Feb/20 06:08
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377457203
 
 

 ##
 File path: 
sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/ThriftIOTest.java
 ##
 @@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import java.io.Serializable;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import 
org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomStringUtils;
+import org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomUtils;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.values.PCollection;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.Resources;
+import org.apache.thrift.protocol.TBinaryProtocol;
+import org.apache.thrift.protocol.TCompactProtocol;
+import org.apache.thrift.protocol.TJSONProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.protocol.TSimpleJSONProtocol;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link ThriftIO}. */
+@RunWith(JUnit4.class)
+public class ThriftIOTest implements Serializable {
+
+  private static final String RESOURCE_DIR = "ThriftIOTest/";
+
+  private static final String THRIFT_DIR = 
Resources.getResource(RESOURCE_DIR).getPath();
+  private static final String ALL_THRIFT_STRING =
+  Resources.getResource(RESOURCE_DIR).getPath() + "*";
+  private static final TestThriftStruct TEST_THRIFT_STRUCT = new 
TestThriftStruct();
+  private static List testThriftStructs;
+  private final TProtocolFactory tBinaryProtoFactory = new 
TBinaryProtocol.Factory();
+  private final TProtocolFactory tJsonProtocolFactory = new 
TJSONProtocol.Factory();
+  private final TProtocolFactory tSimpleJsonProtocolFactory = new 
TSimpleJSONProtocol.Factory();
+  private final TProtocolFactory tCompactProtocolFactory = new 
TCompactProtocol.Factory();
+  @Rule public transient TestPipeline mainPipeline = TestPipeline.create();
+  @Rule public transient TestPipeline readPipeline = TestPipeline.create();
+  @Rule public transient TestPipeline writePipeline = TestPipeline.create();
+  @Rule public transient TemporaryFolder temporaryFolder = new 
TemporaryFolder();
+
+  @Before
+  public void setUp() throws Exception {
+byte[] bytes = new byte[10];
+ByteBuffer buffer = ByteBuffer.wrap(bytes);
+
+TEST_THRIFT_STRUCT.testByte = 100;
+TEST_THRIFT_STRUCT.testShort = 200;
+TEST_THRIFT_STRUCT.testInt = 2500;
+TEST_THRIFT_STRUCT.testLong = 79303L;
+TEST_THRIFT_STRUCT.testDouble = 25.007;
+TEST_THRIFT_STRUCT.testBool = true;
+TEST_THRIFT_STRUCT.stringIntMap = new HashMap<>();
+TEST_THRIFT_STRUCT.stringIntMap.put("first", (short) 1);
+TEST_THRIFT_STRUCT.stringIntMap.put("second", (short) 2);
+TEST_THRIFT_STRUCT.testBinary = buffer;
+
+testThriftStructs = ImmutableList.copyOf(generateTestObjects(1000L));
+  }
+
+  /** Tests {@link ThriftIO#readFiles(Class)} with {@link TBinaryProtocol}. */
+  @Test
 
 Review comment:
   There doesn't seem to be a test for `read().from(...)` - should there be 
one? (and it's okay if the answer is 'don't need one') : )
 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=384986=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-384986
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 06:08
Start Date: 11/Feb/20 06:08
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377457282
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
 
 Review comment:
   Do we need to pass the class to this transform? Why/why not?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 384986)
Time Spent: 10h 10m  (was: 10h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=384984=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-384984
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 06:08
Start Date: 11/Feb/20 06:08
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377417741
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.io.OutputStream;
+import org.apache.beam.sdk.coders.CoderException;
+import org.apache.beam.sdk.coders.CustomCoder;
+
+public class ThriftCoder extends CustomCoder {
+
+  public static  ThriftCoder of() {
+return new ThriftCoder<>();
+  }
+
+  /**
+   * Encodes the given value of type {@code T} onto the given output stream.
+   *
+   * @param value {@link org.apache.thrift.TBase} to encode.
+   * @param outStream stream to output encoded value to.
+   * @throws IOException if writing to the {@code OutputStream} fails for some 
reason
+   * @throws CoderException if the value could not be encoded for some reason
+   */
+  @Override
+  public void encode(T value, OutputStream outStream) throws CoderException, 
IOException {
+ObjectOutputStream oos = new ObjectOutputStream(outStream);
+oos.writeObject(value);
+oos.flush();
+  }
 
 Review comment:
   It seems that the coder simply does Java serialization. Is that right?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 384984)
Time Spent: 9h 50m  (was: 9h 40m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=384987=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-384987
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 06:08
Start Date: 11/Feb/20 06:08
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377455798
 
 

 ##
 File path: 
sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftStruct.java
 ##
 @@ -0,0 +1,1232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
 
 Review comment:
   Is this file meant to be included in the commit? Or can it be generated from 
the .thrift file - without committing it?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 384987)
Time Spent: 10h 20m  (was: 10h 10m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=384988=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-384988
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 06:08
Start Date: 11/Feb/20 06:08
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377456909
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftCoder.java
 ##
 @@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.io.OutputStream;
+import org.apache.beam.sdk.coders.CoderException;
+import org.apache.beam.sdk.coders.CustomCoder;
+
+public class ThriftCoder extends CustomCoder {
+
+  public static  ThriftCoder of() {
+return new ThriftCoder<>();
+  }
+
+  /**
+   * Encodes the given value of type {@code T} onto the given output stream.
+   *
+   * @param value {@link org.apache.thrift.TBase} to encode.
+   * @param outStream stream to output encoded value to.
+   * @throws IOException if writing to the {@code OutputStream} fails for some 
reason
+   * @throws CoderException if the value could not be encoded for some reason
+   */
+  @Override
+  public void encode(T value, OutputStream outStream) throws CoderException, 
IOException {
+ObjectOutputStream oos = new ObjectOutputStream(outStream);
+oos.writeObject(value);
+oos.flush();
+  }
 
 Review comment:
   Ah so we only consume thrift-serialized records from the storage system, but 
Java-serialize within the pipeline. That's fine.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 384988)
Time Spent: 10h 20m  (was: 10h 10m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=384989=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-384989
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 11/Feb/20 06:08
Start Date: 11/Feb/20 06:08
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r377457540
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
 
 Review comment:
   In fact, it seems that `read()` is not implemented?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 384989)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=364278=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-364278
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 29/Dec/19 03:37
Start Date: 29/Dec/19 03:37
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-569471049
 
 
   Run CommunityMetrics PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 364278)
Time Spent: 9h 40m  (was: 9.5h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=364273=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-364273
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 29/Dec/19 02:14
Start Date: 29/Dec/19 02:14
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-569467327
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 364273)
Time Spent: 9.5h  (was: 9h 20m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=364271=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-364271
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 29/Dec/19 00:19
Start Date: 29/Dec/19 00:19
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-569462219
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 364271)
Time Spent: 9h 20m  (was: 9h 10m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=364269=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-364269
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 29/Dec/19 00:18
Start Date: 29/Dec/19 00:18
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-569462167
 
 
   Run CommunityMetrics PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 364269)
Time Spent: 9h  (was: 8h 50m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=364270=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-364270
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 29/Dec/19 00:18
Start Date: 29/Dec/19 00:18
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-569462167
 
 
   Run CommunityMetrics PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 364270)
Time Spent: 9h 10m  (was: 9h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=364178=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-364178
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 28/Dec/19 09:21
Start Date: 28/Dec/19 09:21
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-569401090
 
 
   @chamikaramj @steveniemitz @gsteelman the PR has been refactored to 
read/write Thrift encoded data. It would be great to get some more feedback, 
thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 364178)
Time Spent: 8h 50m  (was: 8h 40m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361851=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361851
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 20/Dec/19 20:46
Start Date: 20/Dec/19 20:46
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-568081482
 
 
   Great thanks. Yes the TProtocol implementation will be passed in for 
decoding/encoding. 
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361851)
Time Spent: 8h 40m  (was: 8.5h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361833=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361833
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 20/Dec/19 19:53
Start Date: 20/Dec/19 19:53
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-568066985
 
 
   I'd expect #2, where `UserType <: TBase` (ie, it is a thrift struct).  You'd 
also probably want to pass in the TProtocol implementation used to use to 
decode the file I'd assume (binary, compact, etc).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361833)
Time Spent: 8.5h  (was: 8h 20m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361829=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361829
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 20/Dec/19 19:44
Start Date: 20/Dec/19 19:44
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-568062975
 
 
   @steveniemitz thanks for the input. So this PR will change so that the 
connector will read/write thrift encoded data. As someone who would utilize 
this connector what are your thoughts on using either of the following to hold 
the decoded data:
   1. `PCollection` in which case the user would pass in an Avro 
schema that would describe each record in the file.
   2. `PCollection<[UserType]>` in which case the user would pass in some class 
that the record can be decoded to.
   
   Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361829)
Time Spent: 8h 20m  (was: 8h 10m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361235=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361235
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 18/Dec/19 00:50
Start Date: 18/Dec/19 00:50
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r359103959
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/ThriftIdlParser.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift.parser;
+
+import java.io.IOException;
+import java.io.Reader;
+import org.antlr.runtime.ANTLRReaderStream;
+import org.antlr.runtime.CommonTokenStream;
+import org.antlr.runtime.RecognitionException;
+import org.antlr.runtime.tree.BufferedTreeNodeStream;
+import org.antlr.runtime.tree.Tree;
+import org.antlr.runtime.tree.TreeNodeStream;
+import org.apache.beam.sdk.io.thrift.parser.antlr.DocumentGenerator;
+import org.apache.beam.sdk.io.thrift.parser.antlr.ThriftLexer;
+import org.apache.beam.sdk.io.thrift.parser.antlr.ThriftParser;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.CharSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class ThriftIdlParser {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(ThriftIdlParser.class);
+
+  /** Generates {@link Document} from {@link org.antlr.runtime.tree.Tree}. */
+  public static Document parseThriftIdl(CharSource input) throws IOException {
+Tree tree = parseTree(input);
+TreeNodeStream stream = new BufferedTreeNodeStream(tree);
+DocumentGenerator generator = new DocumentGenerator(stream);
+try {
+  return generator.document().value;
+} catch (RecognitionException e) {
+  LOG.error("Failed to generate document: " + e.getMessage());
+  throw new RuntimeException(e);
+}
+  }
+
+  /** Generates {@link org.antlr.runtime.tree.Tree} from input. */
+  static Tree parseTree(CharSource input) throws IOException {
+try (Reader reader = input.openStream()) {
+  ThriftLexer lexer = new ThriftLexer(new ANTLRReaderStream(reader));
+  ThriftParser parser = new ThriftParser(new CommonTokenStream(lexer));
+  try {
+Tree tree = (Tree) parser.document().getTree();
+if (parser.getNumberOfSyntaxErrors() > 0) {
+  LOG.error("Parsing generated " + parser.getNumberOfSyntaxErrors() + 
"errors.");
+  throw new RuntimeException("syntax error");
 
 Review comment:
   That is covered in the `catch` below.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361235)
Time Spent: 8h 10m  (was: 8h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361233=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361233
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 18/Dec/19 00:42
Start Date: 18/Dec/19 00:42
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-566811364
 
 
   @gsteelman @chamikaramj [PR](https://github.com/apache/beam/pull/10395) for 
parser has been opened.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361233)
Time Spent: 8h  (was: 7h 50m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361232=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361232
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 18/Dec/19 00:40
Start Date: 18/Dec/19 00:40
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r359101649
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361230=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361230
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 18/Dec/19 00:29
Start Date: 18/Dec/19 00:29
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-566811364
 
 
   @gsteelman @chamikaramj [PR](https://github.com/apache/beam/pull/10395) for 
parser has been opened 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361230)
Time Spent: 7h 40m  (was: 7.5h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361210=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361210
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 17/Dec/19 23:34
Start Date: 17/Dec/19 23:34
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r359085087
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361203=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361203
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 17/Dec/19 23:26
Start Date: 17/Dec/19 23:26
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r359082635
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/Document.java
 ##
 @@ -0,0 +1,424 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift.parser.model;
+
+import static java.util.Collections.emptyList;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.GenericRecordBuilder;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.beam.sdk.io.thrift.parser.visitor.DocumentVisitor;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+
+/**
+ * The {@link Document} class holds the elements of a Thrift file.
+ *
+ * A {@link Document} is made up of:
+ *
+ * 
+ *   {@link Header} - Contains: includes, cppIncludes, namespaces, and 
defaultNamespace.
+ *   {@link Document#definitions} - Contains list of Thrift {@link 
Definition}.
+ * 
+ */
+public class Document implements Serializable {
+  private Header header;
+  private List definitions;
+
+  public Document(Header header, List definitions) {
+this.header = checkNotNull(header, "header");
+this.definitions = ImmutableList.copyOf(checkNotNull(definitions, 
"definitions"));
+  }
+
+  /** Returns an empty {@link Document}. */
+  public static Document emptyDocument() {
+List includes = emptyList();
+List cppIncludes = emptyList();
+String defaultNamespace = null;
+Map namespaces = Collections.emptyMap();
+Header header = new Header(includes, cppIncludes, defaultNamespace, 
namespaces);
+List definitions = emptyList();
+return new Document(header, definitions);
+  }
+
+  public Document getDocument() {
+return this;
+  }
+
+  public Header getHeader() {
+return this.header;
+  }
+
+  public void setHeader(Header header) {
+this.header = header;
+  }
+
+  public List getDefinitions() {
+return definitions;
+  }
+
+  public void setDefinitions(List definitions) {
+this.definitions = definitions;
+  }
+
+  public void visit(final DocumentVisitor visitor) throws IOException {
+Preconditions.checkNotNull(visitor, "the visitor must not be null!");
+
+for (Definition definition : definitions) {
+  if (visitor.accept(definition)) {
+definition.visit(visitor);
+  }
+}
+  }
+
+  /** Gets Avro {@link Schema} for the object. */
+  public Schema getSchema() {
+return ReflectData.get().getSchema(Document.class);
+  }
+
+  /** Gets {@link Document} as a {@link GenericRecord}. */
+  public GenericRecord getAsGenericRecord() {
+GenericRecordBuilder genericRecordBuilder = new 
GenericRecordBuilder(this.getSchema());
+genericRecordBuilder.set("header", this.getHeader()).set("definitions", 
this.getDefinitions());
+
+return genericRecordBuilder.build();
+  }
+
+  /** Adds list of includes to {@link Document#header}. */
+  public void addIncludes(List includes) {
+checkNotNull(includes, "includes");
+List currentIncludes = new 
ArrayList<>(this.getHeader().getIncludes());
+currentIncludes.addAll(includes);
+

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361137=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361137
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 17/Dec/19 21:45
Start Date: 17/Dec/19 21:45
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r359045633
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/IntegerEnum.java
 ##
 @@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift.parser.model;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import java.util.List;
+import java.util.Objects;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects;
 
 Review comment:
   For the Beam modules the dependencies are defined in the build.gradle for 
the respective module. We import the vendored Guava version there and then 
import that into the code.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361137)
Time Spent: 7h 10m  (was: 7h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361132=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361132
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 17/Dec/19 21:40
Start Date: 17/Dec/19 21:40
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r359043453
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/Document.java
 ##
 @@ -0,0 +1,424 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift.parser.model;
+
+import static java.util.Collections.emptyList;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.GenericRecordBuilder;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.beam.sdk.io.thrift.parser.visitor.DocumentVisitor;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+
+/**
+ * The {@link Document} class holds the elements of a Thrift file.
+ *
+ * A {@link Document} is made up of:
+ *
+ * 
+ *   {@link Header} - Contains: includes, cppIncludes, namespaces, and 
defaultNamespace.
+ *   {@link Document#definitions} - Contains list of Thrift {@link 
Definition}.
+ * 
+ */
+public class Document implements Serializable {
+  private Header header;
+  private List definitions;
+
+  public Document(Header header, List definitions) {
+this.header = checkNotNull(header, "header");
+this.definitions = ImmutableList.copyOf(checkNotNull(definitions, 
"definitions"));
+  }
+
+  /** Returns an empty {@link Document}. */
+  public static Document emptyDocument() {
+List includes = emptyList();
+List cppIncludes = emptyList();
+String defaultNamespace = null;
+Map namespaces = Collections.emptyMap();
+Header header = new Header(includes, cppIncludes, defaultNamespace, 
namespaces);
+List definitions = emptyList();
+return new Document(header, definitions);
+  }
+
+  public Document getDocument() {
+return this;
+  }
+
+  public Header getHeader() {
+return this.header;
+  }
+
+  public void setHeader(Header header) {
+this.header = header;
+  }
+
+  public List getDefinitions() {
+return definitions;
+  }
+
+  public void setDefinitions(List definitions) {
+this.definitions = definitions;
+  }
+
+  public void visit(final DocumentVisitor visitor) throws IOException {
+Preconditions.checkNotNull(visitor, "the visitor must not be null!");
+
+for (Definition definition : definitions) {
+  if (visitor.accept(definition)) {
+definition.visit(visitor);
+  }
+}
+  }
+
+  /** Gets Avro {@link Schema} for the object. */
+  public Schema getSchema() {
+return ReflectData.get().getSchema(Document.class);
+  }
+
+  /** Gets {@link Document} as a {@link GenericRecord}. */
+  public GenericRecord getAsGenericRecord() {
+GenericRecordBuilder genericRecordBuilder = new 
GenericRecordBuilder(this.getSchema());
+genericRecordBuilder.set("header", this.getHeader()).set("definitions", 
this.getDefinitions());
+
+return genericRecordBuilder.build();
+  }
+
+  /** Adds list of includes to {@link Document#header}. */
+  public void addIncludes(List includes) {
+checkNotNull(includes, "includes");
+List currentIncludes = new 
ArrayList<>(this.getHeader().getIncludes());
+currentIncludes.addAll(includes);
+

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361081=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361081
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 17/Dec/19 19:39
Start Date: 17/Dec/19 19:39
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-566717207
 
 
   > Thanks @gsteelman. We would love to have the sample Thrift schema and 
anything else we can utilize for testing!
   
   Who should I have it shared with? Also, https://thrift.apache.org/test/ and 
this schema 
https://raw.githubusercontent.com/apache/thrift/master/test/ThriftTest.thrift 
which might provide additional test cases. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361081)
Time Spent: 6h 50m  (was: 6h 40m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360617=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360617
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 17/Dec/19 00:11
Start Date: 17/Dec/19 00:11
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-566309485
 
 
   > @gsteelman I can split this PR into one for `Document` and one for 
`ThriftIO`. Since `ThriftIO` has a dependency on `Document` I think it would be 
best to review that one first, thoughts?
   > 
   > Thanks!
   
   Yes, I think that would make it a little easier for everyone to review. 
Thank you!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360617)
Time Spent: 6h 40m  (was: 6.5h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360616=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360616
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 17/Dec/19 00:10
Start Date: 17/Dec/19 00:10
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r358531752
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/IntegerEnum.java
 ##
 @@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift.parser.model;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import java.util.List;
+import java.util.Objects;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects;
 
 Review comment:
   I am not sure either. Typically wouldn't the version be specified as a 
variable in a maven pom file, for example?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360616)
Time Spent: 6.5h  (was: 6h 20m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360606=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360606
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 17/Dec/19 00:09
Start Date: 17/Dec/19 00:09
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r358531622
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/Document.java
 ##
 @@ -0,0 +1,424 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift.parser.model;
+
+import static java.util.Collections.emptyList;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.GenericRecordBuilder;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.beam.sdk.io.thrift.parser.visitor.DocumentVisitor;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+
+/**
+ * The {@link Document} class holds the elements of a Thrift file.
+ *
+ * A {@link Document} is made up of:
+ *
+ * 
+ *   {@link Header} - Contains: includes, cppIncludes, namespaces, and 
defaultNamespace.
+ *   {@link Document#definitions} - Contains list of Thrift {@link 
Definition}.
+ * 
+ */
+public class Document implements Serializable {
+  private Header header;
+  private List definitions;
+
+  public Document(Header header, List definitions) {
+this.header = checkNotNull(header, "header");
+this.definitions = ImmutableList.copyOf(checkNotNull(definitions, 
"definitions"));
+  }
+
+  /** Returns an empty {@link Document}. */
+  public static Document emptyDocument() {
+List includes = emptyList();
+List cppIncludes = emptyList();
+String defaultNamespace = null;
+Map namespaces = Collections.emptyMap();
+Header header = new Header(includes, cppIncludes, defaultNamespace, 
namespaces);
+List definitions = emptyList();
+return new Document(header, definitions);
+  }
+
+  public Document getDocument() {
+return this;
+  }
+
+  public Header getHeader() {
+return this.header;
+  }
+
+  public void setHeader(Header header) {
+this.header = header;
+  }
+
+  public List getDefinitions() {
+return definitions;
+  }
+
+  public void setDefinitions(List definitions) {
+this.definitions = definitions;
+  }
+
+  public void visit(final DocumentVisitor visitor) throws IOException {
+Preconditions.checkNotNull(visitor, "the visitor must not be null!");
+
+for (Definition definition : definitions) {
+  if (visitor.accept(definition)) {
+definition.visit(visitor);
+  }
+}
+  }
+
+  /** Gets Avro {@link Schema} for the object. */
+  public Schema getSchema() {
+return ReflectData.get().getSchema(Document.class);
+  }
+
+  /** Gets {@link Document} as a {@link GenericRecord}. */
+  public GenericRecord getAsGenericRecord() {
+GenericRecordBuilder genericRecordBuilder = new 
GenericRecordBuilder(this.getSchema());
+genericRecordBuilder.set("header", this.getHeader()).set("definitions", 
this.getDefinitions());
+
+return genericRecordBuilder.build();
+  }
+
+  /** Adds list of includes to {@link Document#header}. */
+  public void addIncludes(List includes) {
+checkNotNull(includes, "includes");
+List currentIncludes = new 
ArrayList<>(this.getHeader().getIncludes());
+currentIncludes.addAll(includes);
+

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360605=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360605
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 17/Dec/19 00:07
Start Date: 17/Dec/19 00:07
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r358531004
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360603=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360603
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 17/Dec/19 00:04
Start Date: 17/Dec/19 00:04
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r358530461
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360561=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360561
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 16/Dec/19 22:56
Start Date: 16/Dec/19 22:56
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-566285123
 
 
   @gsteelman I can split this PR into one for `Document` and one for 
`ThriftIO`. Since `ThriftIO` has a dependency on `Document` I think it would be 
best to review that one first, thoughts?
   
   Thanks! 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360561)
Time Spent: 5h 50m  (was: 5h 40m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360547=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360547
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 16/Dec/19 22:47
Start Date: 16/Dec/19 22:47
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r358507069
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360530=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360530
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 16/Dec/19 22:28
Start Date: 16/Dec/19 22:28
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r358499952
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360519=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360519
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 16/Dec/19 22:17
Start Date: 16/Dec/19 22:17
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r358495630
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360511=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360511
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 16/Dec/19 21:59
Start Date: 16/Dec/19 21:59
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r358488154
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/IntegerEnum.java
 ##
 @@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift.parser.model;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import java.util.List;
+import java.util.Objects;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects;
 
 Review comment:
   I think because Beam only allows for the repackaged version to be used, the 
version has to be specified (I could be wrong). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360511)
Time Spent: 5h 10m  (was: 5h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360508=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360508
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 16/Dec/19 21:55
Start Date: 16/Dec/19 21:55
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r358486215
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360500=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360500
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 16/Dec/19 21:19
Start Date: 16/Dec/19 21:19
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r358470056
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/ConstList.java
 ##
 @@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift.parser.model;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import java.util.ArrayList;
+import java.util.List;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+
+public class ConstList extends ConstValue {
+  private final List values;
+
+  public ConstList(List values) {
+this.values = ImmutableList.copyOf(checkNotNull(values, "values"));
 
 Review comment:
   This creates an immutable wrapper for the list.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 360500)
Time Spent: 4h 50m  (was: 4h 40m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360497=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360497
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 16/Dec/19 21:13
Start Date: 16/Dec/19 21:13
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r358467379
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360486=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360486
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 16/Dec/19 20:54
Start Date: 16/Dec/19 20:54
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r358458848
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360417=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360417
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 16/Dec/19 18:31
Start Date: 16/Dec/19 18:31
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r358394393
 
 

 ##
 File path: 
sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/DocumentTest.java
 ##
 @@ -0,0 +1,390 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.ConstInteger;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.Union;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.junit.Assert;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link Document} class. */
+@RunWith(JUnit4.class)
+public class DocumentTest {
+
+  /** Tests {@link Document#addIncludes(String)}. */
+  @Test
+  public void testAddIncludes() {
+Document document = Document.emptyDocument();
+List includesExpected = new ArrayList<>();
+includesExpected.add("simple_test.thrift");
+includesExpected.add("shared.thrift");
+document.addIncludes(includesExpected);
+
+List includesActual = document.getHeader().getIncludes();
+
+Assert.assertEquals(includesExpected, includesActual);
+  }
+
+  /** Tests {@link Document#addCppIncludes(List)}. */
+  @Test
+  public void testAddCppIncludes() {
+Document document = Document.emptyDocument();
+List cppIncludesExpected = new ArrayList<>();
+cppIncludesExpected.add("iostream");
+cppIncludesExpected.add("set");
+document.addCppIncludes(cppIncludesExpected);
+
+List cppIncludesActual = document.getHeader().getCppIncludes();
+
+Assert.assertEquals(cppIncludesExpected, cppIncludesActual);
+  }
+
+  /** Tests {@link Document#removeDefinition(String)}. */
+  @Test
+  public void testRemoveDefinition() {
+Document document = Document.emptyDocument();
+List emptyAnnotations = new ArrayList<>();
+String constName = "STRINGCONSTANT";
+document.addConstString(constName, emptyAnnotations, "test_string");
+Assert.assertEquals(1, document.getDefinitions().size());
+
+document.removeDefinition(constName);
+Assert.assertEquals(0, document.getDefinitions().size());
+  }
+
+  /** Tests {@link Document#addConst(String, ThriftType, ConstValue)}. */
+  @Test
+  public void testAddConst() {
+Document document = Document.emptyDocument();
+List emptyAnnotations = new ArrayList<>();
+String constName = "INT32CONSTANT";
+document.addConst(
+constName, new BaseType(BaseType.Type.I32, emptyAnnotations), new 
ConstInteger(252));
+
+String constNameActual = document.getDefinitions().get(0).getName();
+
+Assert.assertEquals(constName, constNameActual);
+Assert.assertTrue(document.getDefinitions().get(0) instanceof Const);
+  }
+
+  /** Tests {@link 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=360411=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360411
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 16/Dec/19 18:26
Start Date: 16/Dec/19 18:26
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r358392005
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/Document.java
 ##
 @@ -0,0 +1,424 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift.parser.model;
+
+import static java.util.Collections.emptyList;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.GenericRecordBuilder;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.beam.sdk.io.thrift.parser.visitor.DocumentVisitor;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+
+/**
+ * The {@link Document} class holds the elements of a Thrift file.
+ *
+ * A {@link Document} is made up of:
+ *
+ * 
+ *   {@link Header} - Contains: includes, cppIncludes, namespaces, and 
defaultNamespace.
+ *   {@link Document#definitions} - Contains list of Thrift {@link 
Definition}.
+ * 
+ */
+public class Document implements Serializable {
+  private Header header;
+  private List definitions;
+
+  public Document(Header header, List definitions) {
+this.header = checkNotNull(header, "header");
+this.definitions = ImmutableList.copyOf(checkNotNull(definitions, 
"definitions"));
+  }
+
+  /** Returns an empty {@link Document}. */
+  public static Document emptyDocument() {
+List includes = emptyList();
+List cppIncludes = emptyList();
+String defaultNamespace = null;
+Map namespaces = Collections.emptyMap();
+Header header = new Header(includes, cppIncludes, defaultNamespace, 
namespaces);
+List definitions = emptyList();
+return new Document(header, definitions);
+  }
+
+  public Document getDocument() {
+return this;
+  }
+
+  public Header getHeader() {
+return this.header;
+  }
+
+  public void setHeader(Header header) {
+this.header = header;
+  }
+
+  public List getDefinitions() {
+return definitions;
+  }
+
+  public void setDefinitions(List definitions) {
+this.definitions = definitions;
+  }
+
+  public void visit(final DocumentVisitor visitor) throws IOException {
+Preconditions.checkNotNull(visitor, "the visitor must not be null!");
+
+for (Definition definition : definitions) {
+  if (visitor.accept(definition)) {
+definition.visit(visitor);
+  }
+}
+  }
+
+  /** Gets Avro {@link Schema} for the object. */
+  public Schema getSchema() {
+return ReflectData.get().getSchema(Document.class);
+  }
+
+  /** Gets {@link Document} as a {@link GenericRecord}. */
+  public GenericRecord getAsGenericRecord() {
+GenericRecordBuilder genericRecordBuilder = new 
GenericRecordBuilder(this.getSchema());
+genericRecordBuilder.set("header", this.getHeader()).set("definitions", 
this.getDefinitions());
+
+return genericRecordBuilder.build();
+  }
+
+  /** Adds list of includes to {@link Document#header}. */
+  public void addIncludes(List includes) {
+checkNotNull(includes, "includes");
+List currentIncludes = new 
ArrayList<>(this.getHeader().getIncludes());
+currentIncludes.addAll(includes);
+

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359717=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359717
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 14/Dec/19 00:25
Start Date: 14/Dec/19 00:25
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357879617
 
 

 ##
 File path: sdks/java/io/thrift/src/main/antlr/DocumentGenerator.g
 ##
 @@ -0,0 +1,262 @@
+/*
+ * Copyright 2012 Facebook, Inc.
 
 Review comment:
   Yes the original file can be found 
[here](https://github.com/facebookarchive/swift/blob/master/swift-idl-parser/src/main/antlr3/com/facebook/swift/parser/antlr/DocumentGenerator.g).
 SpotlessApply seems to have removed it in some of the .java files.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359717)
Time Spent: 4h  (was: 3h 50m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359716=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359716
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 14/Dec/19 00:25
Start Date: 14/Dec/19 00:25
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357879617
 
 

 ##
 File path: sdks/java/io/thrift/src/main/antlr/DocumentGenerator.g
 ##
 @@ -0,0 +1,262 @@
+/*
+ * Copyright 2012 Facebook, Inc.
 
 Review comment:
   Yes the original file can be found 
[here](https://github.com/facebookarchive/swift/blob/master/swift-idl-parser/src/main/antlr3/com/facebook/swift/parser/antlr/Thrift.g).
 SpotlessApply seems to have removed it in some of the .java files.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359716)
Time Spent: 3h 50m  (was: 3h 40m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359715=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359715
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 14/Dec/19 00:24
Start Date: 14/Dec/19 00:24
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357879437
 
 

 ##
 File path: sdks/java/io/thrift/src/main/antlr/Thrift.g
 ##
 @@ -0,0 +1,290 @@
+/*
+ *  Copyright 2008 Martin Traverso
+ *  Copyright 2012 Facebook, Inc.
 
 Review comment:
   Yes the original file can be found 
[here](https://github.com/facebookarchive/swift/blob/master/swift-idl-parser/src/main/antlr3/com/facebook/swift/parser/antlr/Thrift.g).
 We've included it as part of the parser. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359715)
Time Spent: 3h 40m  (was: 3.5h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359712=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359712
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 14/Dec/19 00:04
Start Date: 14/Dec/19 00:04
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357876447
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/ConstInteger.java
 ##
 @@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift.parser.model;
+
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects;
+
+public class ConstInteger extends ConstValue {
 
 Review comment:
   Yes this class is reused for `i64`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359712)
Time Spent: 3.5h  (was: 3h 20m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359711=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359711
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 14/Dec/19 00:01
Start Date: 14/Dec/19 00:01
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357875927
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359709=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359709
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 14/Dec/19 00:00
Start Date: 14/Dec/19 00:00
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357875805
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359710=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359710
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 14/Dec/19 00:00
Start Date: 14/Dec/19 00:00
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357875805
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359707=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359707
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 23:54
Start Date: 13/Dec/19 23:54
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357874789
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359704=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359704
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 23:36
Start Date: 13/Dec/19 23:36
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-565650003
 
 
   Thanks @gsteelman. We would love to have the sample Thrift schema and 
anything else we can utilize for testing!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359704)
Time Spent: 2h 40m  (was: 2.5h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359703=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359703
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 23:34
Start Date: 13/Dec/19 23:34
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357871234
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359680=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359680
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 22:01
Start Date: 13/Dec/19 22:01
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357798688
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359676=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359676
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 22:01
Start Date: 13/Dec/19 22:01
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357844088
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/ConstList.java
 ##
 @@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift.parser.model;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import java.util.ArrayList;
+import java.util.List;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+
+public class ConstList extends ConstValue {
+  private final List values;
+
+  public ConstList(List values) {
+this.values = ImmutableList.copyOf(checkNotNull(values, "values"));
 
 Review comment:
   Docs unclear. Does this create a deep copy or an immutable wrapper?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359676)
Time Spent: 2h 10m  (was: 2h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359671=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359671
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 22:01
Start Date: 13/Dec/19 22:01
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357798355
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

  1   2   >