[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-11-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=340059=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340059
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 07/Nov/19 17:53
Start Date: 07/Nov/19 17:53
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-551191543
 
 
   :tada: Thanks everyone!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340059)
Time Spent: 16h  (was: 15h 50m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 16h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-11-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=339641=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339641
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 06/Nov/19 23:39
Start Date: 06/Nov/19 23:39
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339641)
Time Spent: 15h 50m  (was: 15h 40m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-11-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=339551=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339551
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 06/Nov/19 19:04
Start Date: 06/Nov/19 19:04
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-550454652
 
 
   I can merge this once Python passes
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339551)
Time Spent: 15h 40m  (was: 15.5h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 15h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-11-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=339531=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339531
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 06/Nov/19 18:26
Start Date: 06/Nov/19 18:26
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-550439304
 
 
   @robertwb I squashed it down to reasonable commits, so I think this is ready 
to merge. I think the python failures have been flakes, hopefully it will pass 
this time.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339531)
Time Spent: 15.5h  (was: 15h 20m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 15.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-11-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=339528=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339528
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 06/Nov/19 18:24
Start Date: 06/Nov/19 18:24
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-550438800
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339528)
Time Spent: 15h 20m  (was: 15h 10m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 15h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=339120=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339120
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 06/Nov/19 01:56
Start Date: 06/Nov/19 01:56
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-550106200
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339120)
Time Spent: 15h 10m  (was: 15h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 15h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-11-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=338261=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338261
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:12
Start Date: 04/Nov/19 18:12
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9188: [BEAM-7886] Make row 
coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-549479136
 
 
   Yes, go ahead and squash into meaningful commits and merge. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338261)
Time Spent: 15h  (was: 14h 50m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 15h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-11-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=337934=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337934
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 04/Nov/19 03:49
Start Date: 04/Nov/19 03:49
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #9188: [BEAM-7886] Make row 
coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-549217718
 
 
   Excited to see this happening. Perhaps a good time to squash the fixup 
commits?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337934)
Time Spent: 14h 50m  (was: 14h 40m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 14h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=334279=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334279
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 25/Oct/19 18:26
Start Date: 25/Oct/19 18:26
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-546461618
 
 
   Finally got CI green! @robertwb is this ok to merge now?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334279)
Time Spent: 14h 40m  (was: 14.5h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 14h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=333800=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333800
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 25/Oct/19 00:55
Start Date: 25/Oct/19 00:55
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-546157164
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 333800)
Time Spent: 14.5h  (was: 14h 20m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=333620=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333620
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 24/Oct/19 17:57
Start Date: 24/Oct/19 17:57
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-546032400
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 333620)
Time Spent: 14h 20m  (was: 14h 10m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 14h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=332985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332985
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 24/Oct/19 00:46
Start Date: 24/Oct/19 00:46
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-545693677
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332985)
Time Spent: 14h 10m  (was: 14h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 14h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=332945=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332945
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 23/Oct/19 22:36
Start Date: 23/Oct/19 22:36
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-545664933
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332945)
Time Spent: 14h  (was: 13h 50m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 14h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=330838=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330838
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 19/Oct/19 01:20
Start Date: 19/Oct/19 01:20
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r336716237
 
 

 ##
 File path: sdks/python/apache_beam/coders/standard_coders_test.py
 ##
 @@ -133,11 +171,17 @@ def parse_coder(self, spec):
  for c in spec.get('components', ())]
 context.coders.put_proto(coder_id, beam_runner_api_pb2.Coder(
 spec=beam_runner_api_pb2.FunctionSpec(
-urn=spec['urn'], payload=spec.get('payload')),
+urn=spec['urn'], payload=spec.get('payload', '').encode()),
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330838)
Time Spent: 13h 50m  (was: 13h 40m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=330831=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330831
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 19/Oct/19 00:13
Start Date: 19/Oct/19 00:13
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r336712062
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CommonCoderTest.java
 ##
 @@ -278,41 +290,90 @@ private static Object convertValue(Object value, 
CommonCoder coderSpec, Coder co
   return WindowedValue.of(windowValue, timestamp, windows, paneInfo);
 } else if (s.equals(getUrn(StandardCoders.Enum.DOUBLE))) {
   return Double.parseDouble((String) value);
+} else if (s.equals(getUrn(StandardCoders.Enum.ROW))) {
+  Schema schema;
+  try {
+schema = 
SchemaTranslation.fromProto(SchemaApi.Schema.parseFrom(coderSpec.getPayload()));
+  } catch (InvalidProtocolBufferException e) {
+throw new RuntimeException("Failed to parse schema payload for row 
coder", e);
+  }
+
+  return parseField(value, Schema.FieldType.row(schema));
 } else {
   throw new IllegalStateException("Unknown coder URN: " + 
coderSpec.getUrn());
 }
   }
 
+  private static Object parseField(Object value, Schema.FieldType fieldType) {
+switch (fieldType.getTypeName()) {
+  case BYTE:
+return ((Number) value).byteValue();
+  case INT16:
+return ((Number) value).shortValue();
+  case INT32:
+return ((Number) value).intValue();
+  case INT64:
+return ((Number) value).longValue();
+  case FLOAT:
+return Float.parseFloat((String) value);
+  case DOUBLE:
+return Double.parseDouble((String) value);
 
 Review comment:
   Ok, I created [BEAM-8437](https://issues.apache.org/jira/browse/BEAM-8437) 
lets take this conversation over there.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330831)
Time Spent: 13h 40m  (was: 13.5h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 13h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=330293=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330293
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 18/Oct/19 05:09
Start Date: 18/Oct/19 05:09
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r336322166
 
 

 ##
 File path: sdks/python/apache_beam/typehints/native_type_compatibility.py
 ##
 @@ -50,6 +50,17 @@ def _get_compatible_args(typ):
   return None
 
 
+def _get_args(typ):
 
 Review comment:
   Ack.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330293)
Time Spent: 13.5h  (was: 13h 20m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=330292=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330292
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 18/Oct/19 05:08
Start Date: 18/Oct/19 05:08
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r336321995
 
 

 ##
 File path: sdks/python/apache_beam/coders/standard_coders_test.py
 ##
 @@ -133,11 +171,17 @@ def parse_coder(self, spec):
  for c in spec.get('components', ())]
 context.coders.put_proto(coder_id, beam_runner_api_pb2.Coder(
 spec=beam_runner_api_pb2.FunctionSpec(
-urn=spec['urn'], payload=spec.get('payload')),
+urn=spec['urn'], payload=spec.get('payload', '').encode()),
 
 Review comment:
   Yep. Latin1 is the identity on the bottom 255 code bytes of unicode (and 
rejects the rest).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330292)
Time Spent: 13h 20m  (was: 13h 10m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=330291=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330291
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 18/Oct/19 05:06
Start Date: 18/Oct/19 05:06
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r336321751
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CommonCoderTest.java
 ##
 @@ -278,41 +290,90 @@ private static Object convertValue(Object value, 
CommonCoder coderSpec, Coder co
   return WindowedValue.of(windowValue, timestamp, windows, paneInfo);
 } else if (s.equals(getUrn(StandardCoders.Enum.DOUBLE))) {
   return Double.parseDouble((String) value);
+} else if (s.equals(getUrn(StandardCoders.Enum.ROW))) {
+  Schema schema;
+  try {
+schema = 
SchemaTranslation.fromProto(SchemaApi.Schema.parseFrom(coderSpec.getPayload()));
+  } catch (InvalidProtocolBufferException e) {
+throw new RuntimeException("Failed to parse schema payload for row 
coder", e);
+  }
+
+  return parseField(value, Schema.FieldType.row(schema));
 } else {
   throw new IllegalStateException("Unknown coder URN: " + 
coderSpec.getUrn());
 }
   }
 
+  private static Object parseField(Object value, Schema.FieldType fieldType) {
+switch (fieldType.getTypeName()) {
+  case BYTE:
+return ((Number) value).byteValue();
+  case INT16:
+return ((Number) value).shortValue();
+  case INT32:
+return ((Number) value).intValue();
+  case INT64:
+return ((Number) value).longValue();
+  case FLOAT:
+return Float.parseFloat((String) value);
+  case DOUBLE:
+return Double.parseDouble((String) value);
 
 Review comment:
   YAML is JSON which supports doubles (really, as its only data type). 
@lukecwik I'm curious, but again this is not a blocker. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330291)
Time Spent: 13h 10m  (was: 13h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=330287=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330287
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 18/Oct/19 05:04
Start Date: 18/Oct/19 05:04
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r336321453
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CoderTranslators.java
 ##
 @@ -118,6 +124,33 @@ public T fromComponents(List> components) {
 };
   }
 
+  static CoderTranslator row() {
+return new CoderTranslator() {
+  @Override
+  public List> getComponents(RowCoder from) {
+return ImmutableList.of();
 
 Review comment:
   For coders, if one had a coder T one was likely to have KV for various 
K, an Iterable, WindowedValue for possibly several window types, and 
various other permutations. Coupled with the fact that leaf coders were often 
huge serialized blobs made for some pretty significant savings. 
   
   Maybe this'll be less of an issue in the streaming world. I think it should 
not be a blocker assuming we'll be able to update this in the (short-term) 
future. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330287)
Time Spent: 13h  (was: 12h 50m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 13h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=327398=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-327398
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 12/Oct/19 23:20
Start Date: 12/Oct/19 23:20
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-541369070
 
 
   @robertwb could you take another look?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 327398)
Time Spent: 12h 50m  (was: 12h 40m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 12h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=325405=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325405
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 09/Oct/19 00:37
Start Date: 09/Oct/19 00:37
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r332789172
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder.py
 ##
 @@ -0,0 +1,173 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import itertools
+from array import array
+
+from apache_beam.coders.coder_impl import StreamCoderImpl
+from apache_beam.coders.coders import BytesCoder
+from apache_beam.coders.coders import Coder
+from apache_beam.coders.coders import FastCoder
+from apache_beam.coders.coders import FloatCoder
+from apache_beam.coders.coders import IterableCoder
+from apache_beam.coders.coders import StrUtf8Coder
+from apache_beam.coders.coders import TupleCoder
+from apache_beam.coders.coders import VarIntCoder
+from apache_beam.portability import common_urns
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+
+__all__ = ["RowCoder"]
+
+
+class RowCoder(FastCoder):
+  """ Coder for `typing.NamedTuple` instances.
+
+  Implements the beam:coder:row:v1 standard coder spec.
+  """
+
+  def __init__(self, schema):
+"""Initializes a :class:`RowCoder`.
+
+Args:
+  schema (apache_beam.portability.api.schema_pb2.Schema): The protobuf
+representation of the schema of the data that the RowCoder will be used
+to encode/decode.
+"""
+self.schema = schema
+self.components = [
+coder_from_type(field.type) for field in self.schema.fields
+]
+
+  def _create_impl(self):
+return RowCoderImpl(self.schema, self.components)
+
+  def is_deterministic(self):
+return all(c.is_deterministic() for c in self.components)
+
+  def to_type_hint(self):
+return named_tuple_from_schema(self.schema)
+
+  def as_cloud_object(self, coders_context=None):
+raise NotImplementedError("as_cloud_object not supported for RowCoder")
+
+  __hash__ = None
+
+  def __eq__(self, other):
+return type(self) == type(other) and self.schema == other.schema
+
+  def to_runner_api_parameter(self, unused_context):
+return (common_urns.coders.ROW.urn, self.schema, [])
+
+  @Coder.register_urn(common_urns.coders.ROW.urn, schema_pb2.Schema)
+  def from_runner_api_parameter(payload, components, unused_context):
+return RowCoder(payload)
+
+  @staticmethod
+  def from_type_hint(named_tuple_type, registry):
+return RowCoder(named_tuple_to_schema(named_tuple_type))
+
+
+def coder_from_type(type_):
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325405)
Time Spent: 12h 40m  (was: 12.5h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 12h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=325403=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325403
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 09/Oct/19 00:36
Start Date: 09/Oct/19 00:36
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r332789120
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder.py
 ##
 @@ -0,0 +1,173 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import itertools
+from array import array
+
+from apache_beam.coders.coder_impl import StreamCoderImpl
+from apache_beam.coders.coders import BytesCoder
+from apache_beam.coders.coders import Coder
+from apache_beam.coders.coders import FastCoder
+from apache_beam.coders.coders import FloatCoder
+from apache_beam.coders.coders import IterableCoder
+from apache_beam.coders.coders import StrUtf8Coder
+from apache_beam.coders.coders import TupleCoder
+from apache_beam.coders.coders import VarIntCoder
+from apache_beam.portability import common_urns
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+
+__all__ = ["RowCoder"]
+
+
+class RowCoder(FastCoder):
+  """ Coder for `typing.NamedTuple` instances.
+
+  Implements the beam:coder:row:v1 standard coder spec.
+  """
+
+  def __init__(self, schema):
+"""Initializes a :class:`RowCoder`.
+
+Args:
+  schema (apache_beam.portability.api.schema_pb2.Schema): The protobuf
+representation of the schema of the data that the RowCoder will be used
+to encode/decode.
+"""
+self.schema = schema
+self.components = [
+coder_from_type(field.type) for field in self.schema.fields
+]
+
+  def _create_impl(self):
+return RowCoderImpl(self.schema, self.components)
+
+  def is_deterministic(self):
+return all(c.is_deterministic() for c in self.components)
+
+  def to_type_hint(self):
+return named_tuple_from_schema(self.schema)
+
+  def as_cloud_object(self, coders_context=None):
+raise NotImplementedError("as_cloud_object not supported for RowCoder")
+
+  __hash__ = None
+
+  def __eq__(self, other):
+return type(self) == type(other) and self.schema == other.schema
+
+  def to_runner_api_parameter(self, unused_context):
+return (common_urns.coders.ROW.urn, self.schema, [])
+
+  @Coder.register_urn(common_urns.coders.ROW.urn, schema_pb2.Schema)
+  def from_runner_api_parameter(payload, components, unused_context):
+return RowCoder(payload)
+
+  @staticmethod
+  def from_type_hint(named_tuple_type, registry):
+return RowCoder(named_tuple_to_schema(named_tuple_type))
+
+
+def coder_from_type(type_):
+  type_info = type_.WhichOneof("type_info")
+  if type_info == "atomic_type":
+if type_.atomic_type in (schema_pb2.INT32,
+ schema_pb2.INT64):
+  return VarIntCoder()
+elif type_.atomic_type == schema_pb2.DOUBLE:
+  return FloatCoder()
+elif type_.atomic_type == schema_pb2.STRING:
+  return StrUtf8Coder()
+  elif type_info == "array_type":
+return IterableCoder(coder_from_type(type_.array_type.element_type))
+
+  # The Java SDK supports several more types, but the coders are not yet
+  # standard, and are not implemented in Python.
+  raise ValueError(
+  "Encountered a type that is not currently supported by RowCoder: %s" %
+  type_)
+
+
+class RowCoderImpl(StreamCoderImpl):
+  """For internal use only; no backwards-compatibility guarantees."""
+  SIZE_CODER = VarIntCoder().get_impl()
+  NULL_CODER = BytesCoder().get_impl()
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For 

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=325402=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325402
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 09/Oct/19 00:34
Start Date: 09/Oct/19 00:34
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r332788742
 
 

 ##
 File path: sdks/python/apache_beam/coders/standard_coders_test.py
 ##
 @@ -133,11 +171,17 @@ def parse_coder(self, spec):
  for c in spec.get('components', ())]
 context.coders.put_proto(coder_id, beam_runner_api_pb2.Coder(
 spec=beam_runner_api_pb2.FunctionSpec(
-urn=spec['urn'], payload=spec.get('payload')),
+urn=spec['urn'], payload=spec.get('payload', '').encode()),
 
 Review comment:
   Yeah this was just a fix I slipped in to get the test working in Python 3 
and forgot about..
   
   It should probably specify an encoding. Do you think it should be `latin1` 
like we do with the [expected encoded 
value](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/coders/standard_coders_test.py#L115)?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325402)
Time Spent: 12h 20m  (was: 12h 10m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 12h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=325398=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325398
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 09/Oct/19 00:12
Start Date: 09/Oct/19 00:12
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r332785140
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CoderTranslators.java
 ##
 @@ -118,6 +124,33 @@ public T fromComponents(List> components) {
 };
   }
 
+  static CoderTranslator row() {
+return new CoderTranslator() {
+  @Override
+  public List> getComponents(RowCoder from) {
+return ImmutableList.of();
 
 Review comment:
   Yes right now there's just a fixed mapping from fieldtype to coder. There's 
not a bug filed for using components, I was thinking that we would just 
continue inlining everything. Do you think we should plan on using components 
instead? What does that get us?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 325398)
Time Spent: 12h 10m  (was: 12h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=324788=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-324788
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 08/Oct/19 00:43
Start Date: 08/Oct/19 00:43
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r332295459
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder.py
 ##
 @@ -0,0 +1,173 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import itertools
+from array import array
+
+from apache_beam.coders.coder_impl import StreamCoderImpl
+from apache_beam.coders.coders import BytesCoder
+from apache_beam.coders.coders import Coder
+from apache_beam.coders.coders import FastCoder
+from apache_beam.coders.coders import FloatCoder
+from apache_beam.coders.coders import IterableCoder
+from apache_beam.coders.coders import StrUtf8Coder
+from apache_beam.coders.coders import TupleCoder
+from apache_beam.coders.coders import VarIntCoder
+from apache_beam.portability import common_urns
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+
+__all__ = ["RowCoder"]
+
+
+class RowCoder(FastCoder):
+  """ Coder for `typing.NamedTuple` instances.
+
+  Implements the beam:coder:row:v1 standard coder spec.
+  """
+
+  def __init__(self, schema):
+"""Initializes a :class:`RowCoder`.
+
+Args:
+  schema (apache_beam.portability.api.schema_pb2.Schema): The protobuf
+representation of the schema of the data that the RowCoder will be used
+to encode/decode.
+"""
+self.schema = schema
+self.components = [
+coder_from_type(field.type) for field in self.schema.fields
+]
+
+  def _create_impl(self):
+return RowCoderImpl(self.schema, self.components)
+
+  def is_deterministic(self):
+return all(c.is_deterministic() for c in self.components)
+
+  def to_type_hint(self):
+return named_tuple_from_schema(self.schema)
+
+  def as_cloud_object(self, coders_context=None):
+raise NotImplementedError("as_cloud_object not supported for RowCoder")
+
+  __hash__ = None
+
+  def __eq__(self, other):
+return type(self) == type(other) and self.schema == other.schema
+
+  def to_runner_api_parameter(self, unused_context):
+return (common_urns.coders.ROW.urn, self.schema, [])
+
+  @Coder.register_urn(common_urns.coders.ROW.urn, schema_pb2.Schema)
+  def from_runner_api_parameter(payload, components, unused_context):
+return RowCoder(payload)
+
+  @staticmethod
+  def from_type_hint(named_tuple_type, registry):
+return RowCoder(named_tuple_to_schema(named_tuple_type))
+
+
+def coder_from_type(type_):
+  type_info = type_.WhichOneof("type_info")
+  if type_info == "atomic_type":
+if type_.atomic_type in (schema_pb2.INT32,
+ schema_pb2.INT64):
+  return VarIntCoder()
+elif type_.atomic_type == schema_pb2.DOUBLE:
+  return FloatCoder()
+elif type_.atomic_type == schema_pb2.STRING:
+  return StrUtf8Coder()
+  elif type_info == "array_type":
+return IterableCoder(coder_from_type(type_.array_type.element_type))
+
+  # The Java SDK supports several more types, but the coders are not yet
+  # standard, and are not implemented in Python.
+  raise ValueError(
+  "Encountered a type that is not currently supported by RowCoder: %s" %
+  type_)
+
+
+class RowCoderImpl(StreamCoderImpl):
+  """For internal use only; no backwards-compatibility guarantees."""
+  SIZE_CODER = VarIntCoder().get_impl()
+  NULL_CODER = BytesCoder().get_impl()
+
+  def __init__(self, schema, components):
+self.schema = schema
+self.constructor = named_tuple_from_schema(schema)
+self.components = list(c.get_impl() for c in components)
+self.has_nullable_fields = any(
+field.type.nullable for field in 

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=324787=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-324787
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 08/Oct/19 00:38
Start Date: 08/Oct/19 00:38
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r332294439
 
 

 ##
 File path: sdks/python/apache_beam/typehints/native_type_compatibility.py
 ##
 @@ -50,6 +50,17 @@ def _get_compatible_args(typ):
   return None
 
 
+def _get_args(typ):
 
 Review comment:
   These are all functions that I needed over in typehints/schemas.py
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 324787)
Time Spent: 11h 50m  (was: 11h 40m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=324784=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-324784
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 08/Oct/19 00:33
Start Date: 08/Oct/19 00:33
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r332293659
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CommonCoderTest.java
 ##
 @@ -278,41 +290,90 @@ private static Object convertValue(Object value, 
CommonCoder coderSpec, Coder co
   return WindowedValue.of(windowValue, timestamp, windows, paneInfo);
 } else if (s.equals(getUrn(StandardCoders.Enum.DOUBLE))) {
   return Double.parseDouble((String) value);
+} else if (s.equals(getUrn(StandardCoders.Enum.ROW))) {
+  Schema schema;
+  try {
+schema = 
SchemaTranslation.fromProto(SchemaApi.Schema.parseFrom(coderSpec.getPayload()));
+  } catch (InvalidProtocolBufferException e) {
+throw new RuntimeException("Failed to parse schema payload for row 
coder", e);
+  }
+
+  return parseField(value, Schema.FieldType.row(schema));
 } else {
   throw new IllegalStateException("Unknown coder URN: " + 
coderSpec.getUrn());
 }
   }
 
+  private static Object parseField(Object value, Schema.FieldType fieldType) {
+switch (fieldType.getTypeName()) {
+  case BYTE:
+return ((Number) value).byteValue();
+  case INT16:
+return ((Number) value).shortValue();
+  case INT32:
+return ((Number) value).intValue();
+  case INT64:
+return ((Number) value).longValue();
+  case FLOAT:
+return Float.parseFloat((String) value);
+  case DOUBLE:
+return Double.parseDouble((String) value);
 
 Review comment:
   I tried to be consistent with the YAML representations used for other coders 
in `convertValues`. We use strings for `DoubleCoder` 
[there](https://github.com/apache/beam/blob/master/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CommonCoderTest.java#L280),
 presumably to avoid the possibility of running into precision errors? It looks 
like yaml does [support](https://yaml.org/type/float.html) floating point but 
explicitly states it doesn't specify a required accuracy for implementations. 
   
   @lukecwik added the `parseDouble` line I linked in 
https://github.com/apache/beam/pull/8205, maybe he can clarify?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 324784)
Time Spent: 11h 40m  (was: 11.5h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=324647=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-324647
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 07/Oct/19 21:57
Start Date: 07/Oct/19 21:57
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r332254689
 
 

 ##
 File path: sdks/python/apache_beam/coders/standard_coders_test.py
 ##
 @@ -133,11 +171,17 @@ def parse_coder(self, spec):
  for c in spec.get('components', ())]
 context.coders.put_proto(coder_id, beam_runner_api_pb2.Coder(
 spec=beam_runner_api_pb2.FunctionSpec(
-urn=spec['urn'], payload=spec.get('payload')),
+urn=spec['urn'], payload=spec.get('payload', '').encode()),
 
 Review comment:
   Why is the encode now needed? Is this a Python 3 change?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 324647)
Time Spent: 11h 10m  (was: 11h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=324654=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-324654
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 07/Oct/19 21:57
Start Date: 07/Oct/19 21:57
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r332233044
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CommonCoderTest.java
 ##
 @@ -278,41 +290,90 @@ private static Object convertValue(Object value, 
CommonCoder coderSpec, Coder co
   return WindowedValue.of(windowValue, timestamp, windows, paneInfo);
 } else if (s.equals(getUrn(StandardCoders.Enum.DOUBLE))) {
   return Double.parseDouble((String) value);
+} else if (s.equals(getUrn(StandardCoders.Enum.ROW))) {
+  Schema schema;
+  try {
+schema = 
SchemaTranslation.fromProto(SchemaApi.Schema.parseFrom(coderSpec.getPayload()));
+  } catch (InvalidProtocolBufferException e) {
+throw new RuntimeException("Failed to parse schema payload for row 
coder", e);
+  }
+
+  return parseField(value, Schema.FieldType.row(schema));
 } else {
   throw new IllegalStateException("Unknown coder URN: " + 
coderSpec.getUrn());
 }
   }
 
+  private static Object parseField(Object value, Schema.FieldType fieldType) {
+switch (fieldType.getTypeName()) {
+  case BYTE:
+return ((Number) value).byteValue();
+  case INT16:
+return ((Number) value).shortValue();
+  case INT32:
+return ((Number) value).intValue();
+  case INT64:
+return ((Number) value).longValue();
+  case FLOAT:
+return Float.parseFloat((String) value);
+  case DOUBLE:
+return Double.parseDouble((String) value);
 
 Review comment:
   Why are these strings? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 324654)
Time Spent: 11.5h  (was: 11h 20m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=324650=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-324650
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 07/Oct/19 21:57
Start Date: 07/Oct/19 21:57
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r332254984
 
 

 ##
 File path: sdks/python/apache_beam/typehints/native_type_compatibility.py
 ##
 @@ -50,6 +50,17 @@ def _get_compatible_args(typ):
   return None
 
 
+def _get_args(typ):
 
 Review comment:
   Should these changes be in a different PR? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 324650)
Time Spent: 11h 20m  (was: 11h 10m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=324652=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-324652
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 07/Oct/19 21:57
Start Date: 07/Oct/19 21:57
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r332253534
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder.py
 ##
 @@ -0,0 +1,173 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import itertools
+from array import array
+
+from apache_beam.coders.coder_impl import StreamCoderImpl
+from apache_beam.coders.coders import BytesCoder
+from apache_beam.coders.coders import Coder
+from apache_beam.coders.coders import FastCoder
+from apache_beam.coders.coders import FloatCoder
+from apache_beam.coders.coders import IterableCoder
+from apache_beam.coders.coders import StrUtf8Coder
+from apache_beam.coders.coders import TupleCoder
+from apache_beam.coders.coders import VarIntCoder
+from apache_beam.portability import common_urns
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+
+__all__ = ["RowCoder"]
+
+
+class RowCoder(FastCoder):
+  """ Coder for `typing.NamedTuple` instances.
+
+  Implements the beam:coder:row:v1 standard coder spec.
+  """
+
+  def __init__(self, schema):
+"""Initializes a :class:`RowCoder`.
+
+Args:
+  schema (apache_beam.portability.api.schema_pb2.Schema): The protobuf
+representation of the schema of the data that the RowCoder will be used
+to encode/decode.
+"""
+self.schema = schema
+self.components = [
+coder_from_type(field.type) for field in self.schema.fields
+]
+
+  def _create_impl(self):
+return RowCoderImpl(self.schema, self.components)
+
+  def is_deterministic(self):
+return all(c.is_deterministic() for c in self.components)
+
+  def to_type_hint(self):
+return named_tuple_from_schema(self.schema)
+
+  def as_cloud_object(self, coders_context=None):
+raise NotImplementedError("as_cloud_object not supported for RowCoder")
+
+  __hash__ = None
+
+  def __eq__(self, other):
+return type(self) == type(other) and self.schema == other.schema
+
+  def to_runner_api_parameter(self, unused_context):
+return (common_urns.coders.ROW.urn, self.schema, [])
+
+  @Coder.register_urn(common_urns.coders.ROW.urn, schema_pb2.Schema)
+  def from_runner_api_parameter(payload, components, unused_context):
+return RowCoder(payload)
+
+  @staticmethod
+  def from_type_hint(named_tuple_type, registry):
+return RowCoder(named_tuple_to_schema(named_tuple_type))
+
+
+def coder_from_type(type_):
+  type_info = type_.WhichOneof("type_info")
+  if type_info == "atomic_type":
+if type_.atomic_type in (schema_pb2.INT32,
+ schema_pb2.INT64):
+  return VarIntCoder()
+elif type_.atomic_type == schema_pb2.DOUBLE:
+  return FloatCoder()
+elif type_.atomic_type == schema_pb2.STRING:
+  return StrUtf8Coder()
+  elif type_info == "array_type":
+return IterableCoder(coder_from_type(type_.array_type.element_type))
+
+  # The Java SDK supports several more types, but the coders are not yet
+  # standard, and are not implemented in Python.
+  raise ValueError(
+  "Encountered a type that is not currently supported by RowCoder: %s" %
+  type_)
+
+
+class RowCoderImpl(StreamCoderImpl):
+  """For internal use only; no backwards-compatibility guarantees."""
+  SIZE_CODER = VarIntCoder().get_impl()
+  NULL_CODER = BytesCoder().get_impl()
+
+  def __init__(self, schema, components):
+self.schema = schema
+self.constructor = named_tuple_from_schema(schema)
+self.components = list(c.get_impl() for c in components)
+self.has_nullable_fields = any(
+field.type.nullable for field in 

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=324651=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-324651
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 07/Oct/19 21:57
Start Date: 07/Oct/19 21:57
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r332253829
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder_test.py
 ##
 @@ -0,0 +1,129 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+from __future__ import absolute_import
+
+import logging
+import typing
+import unittest
+from itertools import chain
+
+import numpy as np
+from past.builtins import unicode
+
+from apache_beam.coders import RowCoder
+from apache_beam.coders.typecoders import registry as coders_registry
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import typing_to_runner_api
+
+Person = typing.NamedTuple("Person", [
+("name", unicode),
+("age", np.int32),
+("address", typing.Optional[unicode]),
+("aliases", typing.List[unicode]),
+])
+
+coders_registry.register_coder(Person, RowCoder)
 
 Review comment:
   +1
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 324651)
Time Spent: 11h 20m  (was: 11h 10m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=324653=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-324653
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 07/Oct/19 21:57
Start Date: 07/Oct/19 21:57
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r332233555
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder.py
 ##
 @@ -0,0 +1,173 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import itertools
+from array import array
+
+from apache_beam.coders.coder_impl import StreamCoderImpl
+from apache_beam.coders.coders import BytesCoder
+from apache_beam.coders.coders import Coder
+from apache_beam.coders.coders import FastCoder
+from apache_beam.coders.coders import FloatCoder
+from apache_beam.coders.coders import IterableCoder
+from apache_beam.coders.coders import StrUtf8Coder
+from apache_beam.coders.coders import TupleCoder
+from apache_beam.coders.coders import VarIntCoder
+from apache_beam.portability import common_urns
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+
+__all__ = ["RowCoder"]
+
+
+class RowCoder(FastCoder):
+  """ Coder for `typing.NamedTuple` instances.
+
+  Implements the beam:coder:row:v1 standard coder spec.
+  """
+
+  def __init__(self, schema):
+"""Initializes a :class:`RowCoder`.
+
+Args:
+  schema (apache_beam.portability.api.schema_pb2.Schema): The protobuf
+representation of the schema of the data that the RowCoder will be used
+to encode/decode.
+"""
+self.schema = schema
+self.components = [
+coder_from_type(field.type) for field in self.schema.fields
+]
+
+  def _create_impl(self):
+return RowCoderImpl(self.schema, self.components)
+
+  def is_deterministic(self):
+return all(c.is_deterministic() for c in self.components)
+
+  def to_type_hint(self):
+return named_tuple_from_schema(self.schema)
+
+  def as_cloud_object(self, coders_context=None):
+raise NotImplementedError("as_cloud_object not supported for RowCoder")
+
+  __hash__ = None
+
+  def __eq__(self, other):
+return type(self) == type(other) and self.schema == other.schema
+
+  def to_runner_api_parameter(self, unused_context):
+return (common_urns.coders.ROW.urn, self.schema, [])
+
+  @Coder.register_urn(common_urns.coders.ROW.urn, schema_pb2.Schema)
+  def from_runner_api_parameter(payload, components, unused_context):
+return RowCoder(payload)
+
+  @staticmethod
+  def from_type_hint(named_tuple_type, registry):
+return RowCoder(named_tuple_to_schema(named_tuple_type))
+
+
+def coder_from_type(type_):
 
 Review comment:
   Should this be a (static) method on RowCoder?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 324653)
Time Spent: 11.5h  (was: 11h 20m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=324649=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-324649
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 07/Oct/19 21:57
Start Date: 07/Oct/19 21:57
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r332234275
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder.py
 ##
 @@ -0,0 +1,173 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import itertools
+from array import array
+
+from apache_beam.coders.coder_impl import StreamCoderImpl
+from apache_beam.coders.coders import BytesCoder
+from apache_beam.coders.coders import Coder
+from apache_beam.coders.coders import FastCoder
+from apache_beam.coders.coders import FloatCoder
+from apache_beam.coders.coders import IterableCoder
+from apache_beam.coders.coders import StrUtf8Coder
+from apache_beam.coders.coders import TupleCoder
+from apache_beam.coders.coders import VarIntCoder
+from apache_beam.portability import common_urns
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+
+__all__ = ["RowCoder"]
+
+
+class RowCoder(FastCoder):
+  """ Coder for `typing.NamedTuple` instances.
+
+  Implements the beam:coder:row:v1 standard coder spec.
+  """
+
+  def __init__(self, schema):
+"""Initializes a :class:`RowCoder`.
+
+Args:
+  schema (apache_beam.portability.api.schema_pb2.Schema): The protobuf
+representation of the schema of the data that the RowCoder will be used
+to encode/decode.
+"""
+self.schema = schema
+self.components = [
+coder_from_type(field.type) for field in self.schema.fields
+]
+
+  def _create_impl(self):
+return RowCoderImpl(self.schema, self.components)
+
+  def is_deterministic(self):
+return all(c.is_deterministic() for c in self.components)
+
+  def to_type_hint(self):
+return named_tuple_from_schema(self.schema)
+
+  def as_cloud_object(self, coders_context=None):
+raise NotImplementedError("as_cloud_object not supported for RowCoder")
+
+  __hash__ = None
+
+  def __eq__(self, other):
+return type(self) == type(other) and self.schema == other.schema
+
+  def to_runner_api_parameter(self, unused_context):
+return (common_urns.coders.ROW.urn, self.schema, [])
+
+  @Coder.register_urn(common_urns.coders.ROW.urn, schema_pb2.Schema)
+  def from_runner_api_parameter(payload, components, unused_context):
+return RowCoder(payload)
+
+  @staticmethod
+  def from_type_hint(named_tuple_type, registry):
+return RowCoder(named_tuple_to_schema(named_tuple_type))
+
+
+def coder_from_type(type_):
+  type_info = type_.WhichOneof("type_info")
+  if type_info == "atomic_type":
+if type_.atomic_type in (schema_pb2.INT32,
+ schema_pb2.INT64):
+  return VarIntCoder()
+elif type_.atomic_type == schema_pb2.DOUBLE:
+  return FloatCoder()
+elif type_.atomic_type == schema_pb2.STRING:
+  return StrUtf8Coder()
+  elif type_info == "array_type":
+return IterableCoder(coder_from_type(type_.array_type.element_type))
+
+  # The Java SDK supports several more types, but the coders are not yet
+  # standard, and are not implemented in Python.
+  raise ValueError(
+  "Encountered a type that is not currently supported by RowCoder: %s" %
+  type_)
+
+
+class RowCoderImpl(StreamCoderImpl):
+  """For internal use only; no backwards-compatibility guarantees."""
+  SIZE_CODER = VarIntCoder().get_impl()
+  NULL_CODER = BytesCoder().get_impl()
 
 Review comment:
   This doesn't encode nulls, maybe NULL_MARKER_CODER or similar? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use 

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=324648=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-324648
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 07/Oct/19 21:57
Start Date: 07/Oct/19 21:57
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r332231580
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CoderTranslators.java
 ##
 @@ -118,6 +124,33 @@ public T fromComponents(List> components) {
 };
   }
 
+  static CoderTranslator row() {
+return new CoderTranslator() {
+  @Override
+  public List> getComponents(RowCoder from) {
+return ImmutableList.of();
 
 Review comment:
   So for the time being, we're inlining everything, rather than using 
components. Was there a bug tracking doing better for this? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 324648)
Time Spent: 11h 10m  (was: 11h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=323965=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323965
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 05/Oct/19 18:13
Start Date: 05/Oct/19 18:13
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r331756295
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/PortableSchemaCoder.java
 ##
 @@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas;
+
+import org.apache.beam.sdk.transforms.SerializableFunctions;
+import org.apache.beam.sdk.values.Row;
+
+/**
+ * A version of SchemaCoder that can only produce/consume Row instances.
+ *
+ * Implements the beam:coders:row:v1 standard coder while still satisfying 
the requirement that a
+ * PCollection is only considered to have a schema if its coder is an instance 
of SchemaCoder.
+ */
+public class PortableSchemaCoder extends SchemaCoder {
 
 Review comment:
   I updated this PR to use RowCoder as the Java implementation of 
`beam:coder:row:v1` and removed `PortableSchemaCoder` now that #9424 is in.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323965)
Time Spent: 11h  (was: 10h 50m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 11h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=319240=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319240
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 26/Sep/19 21:36
Start Date: 26/Sep/19 21:36
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-535695414
 
 
   Apologies for letting this go stale. After 
[BEAM-8111](https://issues.apache.org/jira/browse/BEAM-8111) I wanted to make 
sure we had some better test coverage on the Java side.
   
   @udim, @aaltay, @robertwb, and/or @chadrik - would you mind taking another 
look at the Python changes now?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 319240)
Time Spent: 10h 50m  (was: 10h 40m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=319238=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319238
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 26/Sep/19 21:33
Start Date: 26/Sep/19 21:33
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r328835963
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder.py
 ##
 @@ -0,0 +1,162 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import itertools
+from array import array
+
+from apache_beam.coders.coder_impl import StreamCoderImpl
+from apache_beam.coders.coders import BytesCoder
+from apache_beam.coders.coders import Coder
+from apache_beam.coders.coders import FastCoder
+from apache_beam.coders.coders import FloatCoder
+from apache_beam.coders.coders import IterableCoder
+from apache_beam.coders.coders import StrUtf8Coder
+from apache_beam.coders.coders import TupleCoder
+from apache_beam.coders.coders import VarIntCoder
+from apache_beam.portability import common_urns
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+
+__all__ = ["RowCoder"]
+
+
+class RowCoder(FastCoder):
+  """ Coder for `typing.NamedTuple` instances.
+
+  Implements the beam:coder:row:v1 standard coder spec.
+  """
+
+  def __init__(self, schema):
+self.schema = schema
+self.components = [
+coder_from_type(field.type) for field in self.schema.fields
+]
+
+  def _create_impl(self):
+return RowCoderImpl(self.schema, self.components)
+
+  def is_deterministic(self):
+return all(c.is_deterministic() for c in self.components)
+
+  def to_type_hint(self):
+return named_tuple_from_schema(self.schema)
+
+  def as_cloud_object(self, coders_context=None):
+raise NotImplementedError("TODO")
+
+  def __eq__(self, other):
+return type(self) == type(other) and self.schema == other.schema
+
+  def __hash__(self):
 
 Review comment:
   Did some reading on this and it looks like I was wrong. From 
[SO](https://stackoverflow.com/a/53519136):
   > A class that overrides `__eq__()` and does not define `__hash__()` will 
have its `__hash__()` implicitly set to None.
   
   I went ahead and removed __hash__
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 319238)
Time Spent: 10.5h  (was: 10h 20m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=319239=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319239
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 26/Sep/19 21:33
Start Date: 26/Sep/19 21:33
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r328835963
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder.py
 ##
 @@ -0,0 +1,162 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import itertools
+from array import array
+
+from apache_beam.coders.coder_impl import StreamCoderImpl
+from apache_beam.coders.coders import BytesCoder
+from apache_beam.coders.coders import Coder
+from apache_beam.coders.coders import FastCoder
+from apache_beam.coders.coders import FloatCoder
+from apache_beam.coders.coders import IterableCoder
+from apache_beam.coders.coders import StrUtf8Coder
+from apache_beam.coders.coders import TupleCoder
+from apache_beam.coders.coders import VarIntCoder
+from apache_beam.portability import common_urns
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+
+__all__ = ["RowCoder"]
+
+
+class RowCoder(FastCoder):
+  """ Coder for `typing.NamedTuple` instances.
+
+  Implements the beam:coder:row:v1 standard coder spec.
+  """
+
+  def __init__(self, schema):
+self.schema = schema
+self.components = [
+coder_from_type(field.type) for field in self.schema.fields
+]
+
+  def _create_impl(self):
+return RowCoderImpl(self.schema, self.components)
+
+  def is_deterministic(self):
+return all(c.is_deterministic() for c in self.components)
+
+  def to_type_hint(self):
+return named_tuple_from_schema(self.schema)
+
+  def as_cloud_object(self, coders_context=None):
+raise NotImplementedError("TODO")
+
+  def __eq__(self, other):
+return type(self) == type(other) and self.schema == other.schema
+
+  def __hash__(self):
 
 Review comment:
   Did some reading on this and it looks like I was wrong. From 
[SO](https://stackoverflow.com/a/53519136):
   > A class that overrides `__eq__()` and does not define `__hash__()` will 
have its `__hash__()` implicitly set to None.
   
   I went ahead and removed `__hash__`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 319239)
Time Spent: 10h 40m  (was: 10.5h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=319237=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-319237
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 26/Sep/19 21:32
Start Date: 26/Sep/19 21:32
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r328835963
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder.py
 ##
 @@ -0,0 +1,162 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import itertools
+from array import array
+
+from apache_beam.coders.coder_impl import StreamCoderImpl
+from apache_beam.coders.coders import BytesCoder
+from apache_beam.coders.coders import Coder
+from apache_beam.coders.coders import FastCoder
+from apache_beam.coders.coders import FloatCoder
+from apache_beam.coders.coders import IterableCoder
+from apache_beam.coders.coders import StrUtf8Coder
+from apache_beam.coders.coders import TupleCoder
+from apache_beam.coders.coders import VarIntCoder
+from apache_beam.portability import common_urns
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+
+__all__ = ["RowCoder"]
+
+
+class RowCoder(FastCoder):
+  """ Coder for `typing.NamedTuple` instances.
+
+  Implements the beam:coder:row:v1 standard coder spec.
+  """
+
+  def __init__(self, schema):
+self.schema = schema
+self.components = [
+coder_from_type(field.type) for field in self.schema.fields
+]
+
+  def _create_impl(self):
+return RowCoderImpl(self.schema, self.components)
+
+  def is_deterministic(self):
+return all(c.is_deterministic() for c in self.components)
+
+  def to_type_hint(self):
+return named_tuple_from_schema(self.schema)
+
+  def as_cloud_object(self, coders_context=None):
+raise NotImplementedError("TODO")
+
+  def __eq__(self, other):
+return type(self) == type(other) and self.schema == other.schema
+
+  def __hash__(self):
 
 Review comment:
   Did some reading on this and it looks like I was wrong. From 
(SO)[https://stackoverflow.com/a/53519136]:
   > A class that overrides `__eq__()` and does not define `__hash__()` will 
have its `__hash__()` implicitly set to None.
   
   I went ahead and removed __hash__
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 319237)
Time Spent: 10h 20m  (was: 10h 10m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=302235=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-302235
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 27/Aug/19 17:19
Start Date: 27/Aug/19 17:19
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r318200575
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder.py
 ##
 @@ -0,0 +1,162 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import itertools
+from array import array
+
+from apache_beam.coders.coder_impl import StreamCoderImpl
+from apache_beam.coders.coders import BytesCoder
+from apache_beam.coders.coders import Coder
+from apache_beam.coders.coders import FastCoder
+from apache_beam.coders.coders import FloatCoder
+from apache_beam.coders.coders import IterableCoder
+from apache_beam.coders.coders import StrUtf8Coder
+from apache_beam.coders.coders import TupleCoder
+from apache_beam.coders.coders import VarIntCoder
+from apache_beam.portability import common_urns
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+
+__all__ = ["RowCoder"]
+
+
+class RowCoder(FastCoder):
+  """ Coder for `typing.NamedTuple` instances.
+
+  Implements the beam:coder:row:v1 standard coder spec.
+  """
+
+  def __init__(self, schema):
+self.schema = schema
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 302235)
Time Spent: 10h 10m  (was: 10h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=300639=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300639
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 24/Aug/19 01:02
Start Date: 24/Aug/19 01:02
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r317338268
 
 

 ##
 File path: sdks/python/apache_beam/typehints/schemas.py
 ##
 @@ -0,0 +1,217 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+""" Support for mapping python types to proto Schemas and back again.
+
+Python  Schema
+np.int8 <-> BYTE
+np.int16<-> INT16
+np.int32<-> INT32
+np.int64<-> INT64
+int ---/
+np.float32  <-> FLOAT
+np.float64  <-> DOUBLE
+float   ---/
+bool<-> BOOLEAN
+
+The mappings for STRING and BYTES are different between python 2 and python 3,
+because of the changes to str:
+py3:
+str/unicode <-> STRING
+bytes   <-> BYTES
+ByteString  ---/
+
+py2:
+unicode <-> STRING
+str/bytes   ---/
+ByteString  <-> BYTES
+"""
+
+from __future__ import absolute_import
+
+import sys
+from typing import ByteString
+from typing import Mapping
+from typing import NamedTuple
+from typing import Optional
+from typing import Sequence
+from uuid import uuid4
+
+import numpy as np
+from past.builtins import unicode
+
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.native_type_compatibility import _get_args
+from apache_beam.typehints.native_type_compatibility import 
_match_is_exactly_mapping
+from apache_beam.typehints.native_type_compatibility import 
_match_is_named_tuple
+from apache_beam.typehints.native_type_compatibility import _match_is_optional
+from apache_beam.typehints.native_type_compatibility import _safe_issubclass
+from apache_beam.typehints.native_type_compatibility import 
extract_optional_type
+
+
+# Registry of typings for a schema by UUID
+class SchemaTypeRegistry(object):
+  def __init__(self):
+self.by_id = {}
+self.by_typing = {}
+
+  def add(self, typing, schema):
+self.by_id[schema.id] = (typing, schema)
+
+  def get_typing_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[0] if result is not None else None
+
+  def get_schema_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[1] if result is not None else None
+
+
+SCHEMA_REGISTRY = SchemaTypeRegistry()
+
+
+# Bi-directional mappings
+_PRIMITIVES = (
+(np.int8, schema_pb2.AtomicType.BYTE),
+(np.int16, schema_pb2.AtomicType.INT16),
+(np.int32, schema_pb2.AtomicType.INT32),
+(np.int64, schema_pb2.AtomicType.INT64),
+(np.float32, schema_pb2.AtomicType.FLOAT),
+(np.float64, schema_pb2.AtomicType.DOUBLE),
+(unicode, schema_pb2.AtomicType.STRING),
+(bool, schema_pb2.AtomicType.BOOLEAN),
+(bytes if sys.version_info.major >= 3 else ByteString,
+ schema_pb2.AtomicType.BYTES),
+)
+
+PRIMITIVE_TO_ATOMIC_TYPE = dict((typ, atomic) for typ, atomic in _PRIMITIVES)
+ATOMIC_TYPE_TO_PRIMITIVE = dict((atomic, typ) for typ, atomic in _PRIMITIVES)
+
+# One-way mappings
+PRIMITIVE_TO_ATOMIC_TYPE.update({
+# In python 3, this is a no-op because str == unicode,
+# but in python 2 it overrides the bytes -> BYTES mapping.
+str: schema_pb2.AtomicType.STRING,
 
 Review comment:
   I think that makes sense. It doesn't seem too onerous to ask people to use 
unicode in python 2. And that's a good point that it's a backwards compatible 
change if we find out otherwise. I pushed a commit that does this: 
https://github.com/apache/beam/pull/9188/commits/ecaf73cd8eb82739c21f0a3551efa0d03cc842b6
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this 

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=300636=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300636
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 24/Aug/19 00:32
Start Date: 24/Aug/19 00:32
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-524499580
 
 
   @reuvenlax I put up a separate PR for this at #9424, since it ended up being 
a pretty large diff.
   
   I removed `RowSize` and `RowSizeTest` since they don't seem to be used and 
it let me remove some `RowCoder` functionality, and the rest of the 
functionality (assigning IDs to schemas, `CODER_MAP`, `coderForFieldType`, 
etc... is now in `SchemaCoder`.
   
   `RowCoder` is still there but just as a trivial sub-class of 
`SchemaCoder`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300636)
Time Spent: 9h 50m  (was: 9h 40m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=300635=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300635
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 24/Aug/19 00:32
Start Date: 24/Aug/19 00:32
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-524499580
 
 
   @reuvenlax I put up a separate PR for this at #9424, since it ended up being 
a pretty large diff.
   
   I removed `RowSize` and `RowSizeTest` since they don't seem to be used and 
it let me remove some `RowCoder` functionality, and the rest of the 
functionality (assigning IDs to schemas, `CODER_MAP`, `coderForFieldType`, 
etc... is now in `SchemaCoder`.
   
   RowCoder is still there but just as a trivial sub-class of 
`SchemaCoder`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300635)
Time Spent: 9h 40m  (was: 9.5h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=300606=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300606
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 23/Aug/19 23:08
Start Date: 23/Aug/19 23:08
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-524488668
 
 
   Let me see if I can put together a patch that does that. I think I could 
just get rid of the current RowCoder and move it's logic to SchemaCoder and 
RowCoderGenerator.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300606)
Time Spent: 9.5h  (was: 9h 20m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=300604=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300604
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 23/Aug/19 23:07
Start Date: 23/Aug/19 23:07
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r317327150
 
 

 ##
 File path: sdks/python/apache_beam/typehints/schemas.py
 ##
 @@ -0,0 +1,217 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+""" Support for mapping python types to proto Schemas and back again.
+
+Python  Schema
+np.int8 <-> BYTE
+np.int16<-> INT16
+np.int32<-> INT32
+np.int64<-> INT64
+int ---/
+np.float32  <-> FLOAT
+np.float64  <-> DOUBLE
+float   ---/
+bool<-> BOOLEAN
+
+The mappings for STRING and BYTES are different between python 2 and python 3,
+because of the changes to str:
+py3:
+str/unicode <-> STRING
+bytes   <-> BYTES
+ByteString  ---/
+
+py2:
+unicode <-> STRING
+str/bytes   ---/
+ByteString  <-> BYTES
+"""
+
+from __future__ import absolute_import
+
+import sys
+from typing import ByteString
+from typing import Mapping
+from typing import NamedTuple
+from typing import Optional
+from typing import Sequence
+from uuid import uuid4
+
+import numpy as np
+from past.builtins import unicode
+
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.native_type_compatibility import _get_args
+from apache_beam.typehints.native_type_compatibility import 
_match_is_exactly_mapping
+from apache_beam.typehints.native_type_compatibility import 
_match_is_named_tuple
+from apache_beam.typehints.native_type_compatibility import _match_is_optional
+from apache_beam.typehints.native_type_compatibility import _safe_issubclass
+from apache_beam.typehints.native_type_compatibility import 
extract_optional_type
+
+
+# Registry of typings for a schema by UUID
+class SchemaTypeRegistry(object):
+  def __init__(self):
+self.by_id = {}
+self.by_typing = {}
+
+  def add(self, typing, schema):
+self.by_id[schema.id] = (typing, schema)
+
+  def get_typing_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[0] if result is not None else None
+
+  def get_schema_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[1] if result is not None else None
+
+
+SCHEMA_REGISTRY = SchemaTypeRegistry()
+
+
+# Bi-directional mappings
+_PRIMITIVES = (
+(np.int8, schema_pb2.AtomicType.BYTE),
+(np.int16, schema_pb2.AtomicType.INT16),
+(np.int32, schema_pb2.AtomicType.INT32),
+(np.int64, schema_pb2.AtomicType.INT64),
+(np.float32, schema_pb2.AtomicType.FLOAT),
+(np.float64, schema_pb2.AtomicType.DOUBLE),
+(unicode, schema_pb2.AtomicType.STRING),
+(bool, schema_pb2.AtomicType.BOOLEAN),
+(bytes if sys.version_info.major >= 3 else ByteString,
+ schema_pb2.AtomicType.BYTES),
+)
+
+PRIMITIVE_TO_ATOMIC_TYPE = dict((typ, atomic) for typ, atomic in _PRIMITIVES)
+ATOMIC_TYPE_TO_PRIMITIVE = dict((atomic, typ) for typ, atomic in _PRIMITIVES)
+
+# One-way mappings
+PRIMITIVE_TO_ATOMIC_TYPE.update({
+# In python 3, this is a no-op because str == unicode,
+# but in python 2 it overrides the bytes -> BYTES mapping.
+str: schema_pb2.AtomicType.STRING,
 
 Review comment:
   I think it'll be harder to consistently exclude support for Python 2 for all 
schema use. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300604)
Time Spent: 9h 20m  (was: 9h 10m)

> Make row coder a standard coder and implement 

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=300594=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300594
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 23/Aug/19 23:02
Start Date: 23/Aug/19 23:02
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r317326390
 
 

 ##
 File path: sdks/python/apache_beam/typehints/schemas.py
 ##
 @@ -0,0 +1,217 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+""" Support for mapping python types to proto Schemas and back again.
+
+Python  Schema
+np.int8 <-> BYTE
+np.int16<-> INT16
+np.int32<-> INT32
+np.int64<-> INT64
+int ---/
+np.float32  <-> FLOAT
+np.float64  <-> DOUBLE
+float   ---/
+bool<-> BOOLEAN
+
+The mappings for STRING and BYTES are different between python 2 and python 3,
+because of the changes to str:
+py3:
+str/unicode <-> STRING
+bytes   <-> BYTES
+ByteString  ---/
+
+py2:
+unicode <-> STRING
+str/bytes   ---/
+ByteString  <-> BYTES
+"""
+
+from __future__ import absolute_import
+
+import sys
+from typing import ByteString
+from typing import Mapping
+from typing import NamedTuple
+from typing import Optional
+from typing import Sequence
+from uuid import uuid4
+
+import numpy as np
+from past.builtins import unicode
+
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.native_type_compatibility import _get_args
+from apache_beam.typehints.native_type_compatibility import 
_match_is_exactly_mapping
+from apache_beam.typehints.native_type_compatibility import 
_match_is_named_tuple
+from apache_beam.typehints.native_type_compatibility import _match_is_optional
+from apache_beam.typehints.native_type_compatibility import _safe_issubclass
+from apache_beam.typehints.native_type_compatibility import 
extract_optional_type
+
+
+# Registry of typings for a schema by UUID
+class SchemaTypeRegistry(object):
+  def __init__(self):
+self.by_id = {}
+self.by_typing = {}
+
+  def add(self, typing, schema):
+self.by_id[schema.id] = (typing, schema)
+
+  def get_typing_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[0] if result is not None else None
+
+  def get_schema_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[1] if result is not None else None
+
+
+SCHEMA_REGISTRY = SchemaTypeRegistry()
+
+
+# Bi-directional mappings
+_PRIMITIVES = (
+(np.int8, schema_pb2.AtomicType.BYTE),
+(np.int16, schema_pb2.AtomicType.INT16),
+(np.int32, schema_pb2.AtomicType.INT32),
+(np.int64, schema_pb2.AtomicType.INT64),
+(np.float32, schema_pb2.AtomicType.FLOAT),
+(np.float64, schema_pb2.AtomicType.DOUBLE),
+(unicode, schema_pb2.AtomicType.STRING),
+(bool, schema_pb2.AtomicType.BOOLEAN),
+(bytes if sys.version_info.major >= 3 else ByteString,
+ schema_pb2.AtomicType.BYTES),
+)
+
+PRIMITIVE_TO_ATOMIC_TYPE = dict((typ, atomic) for typ, atomic in _PRIMITIVES)
+ATOMIC_TYPE_TO_PRIMITIVE = dict((atomic, typ) for typ, atomic in _PRIMITIVES)
+
+# One-way mappings
+PRIMITIVE_TO_ATOMIC_TYPE.update({
+# In python 3, this is a no-op because str == unicode,
+# but in python 2 it overrides the bytes -> BYTES mapping.
+str: schema_pb2.AtomicType.STRING,
 
 Review comment:
   Do we really need to support python2? If this is going to be a burden in 
general, I would rather not add support for it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300594)
Time Spent: 9h 10m  (was: 9h)

> Make row coder a 

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=300542=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300542
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 23/Aug/19 22:13
Start Date: 23/Aug/19 22:13
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r317318110
 
 

 ##
 File path: sdks/python/apache_beam/typehints/schemas.py
 ##
 @@ -0,0 +1,217 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+""" Support for mapping python types to proto Schemas and back again.
+
+Python  Schema
+np.int8 <-> BYTE
+np.int16<-> INT16
+np.int32<-> INT32
+np.int64<-> INT64
+int ---/
+np.float32  <-> FLOAT
+np.float64  <-> DOUBLE
+float   ---/
+bool<-> BOOLEAN
+
+The mappings for STRING and BYTES are different between python 2 and python 3,
+because of the changes to str:
+py3:
+str/unicode <-> STRING
+bytes   <-> BYTES
+ByteString  ---/
+
+py2:
+unicode <-> STRING
+str/bytes   ---/
+ByteString  <-> BYTES
+"""
+
+from __future__ import absolute_import
+
+import sys
+from typing import ByteString
+from typing import Mapping
+from typing import NamedTuple
+from typing import Optional
+from typing import Sequence
+from uuid import uuid4
+
+import numpy as np
+from past.builtins import unicode
+
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.native_type_compatibility import _get_args
+from apache_beam.typehints.native_type_compatibility import 
_match_is_exactly_mapping
+from apache_beam.typehints.native_type_compatibility import 
_match_is_named_tuple
+from apache_beam.typehints.native_type_compatibility import _match_is_optional
+from apache_beam.typehints.native_type_compatibility import _safe_issubclass
+from apache_beam.typehints.native_type_compatibility import 
extract_optional_type
+
+
+# Registry of typings for a schema by UUID
+class SchemaTypeRegistry(object):
+  def __init__(self):
+self.by_id = {}
+self.by_typing = {}
+
+  def add(self, typing, schema):
+self.by_id[schema.id] = (typing, schema)
+
+  def get_typing_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[0] if result is not None else None
+
+  def get_schema_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[1] if result is not None else None
+
+
+SCHEMA_REGISTRY = SchemaTypeRegistry()
+
+
+# Bi-directional mappings
+_PRIMITIVES = (
+(np.int8, schema_pb2.AtomicType.BYTE),
+(np.int16, schema_pb2.AtomicType.INT16),
+(np.int32, schema_pb2.AtomicType.INT32),
+(np.int64, schema_pb2.AtomicType.INT64),
+(np.float32, schema_pb2.AtomicType.FLOAT),
+(np.float64, schema_pb2.AtomicType.DOUBLE),
+(unicode, schema_pb2.AtomicType.STRING),
+(bool, schema_pb2.AtomicType.BOOLEAN),
+(bytes if sys.version_info.major >= 3 else ByteString,
+ schema_pb2.AtomicType.BYTES),
+)
+
+PRIMITIVE_TO_ATOMIC_TYPE = dict((typ, atomic) for typ, atomic in _PRIMITIVES)
+ATOMIC_TYPE_TO_PRIMITIVE = dict((atomic, typ) for typ, atomic in _PRIMITIVES)
+
+# One-way mappings
+PRIMITIVE_TO_ATOMIC_TYPE.update({
+# In python 3, this is a no-op because str == unicode,
+# but in python 2 it overrides the bytes -> BYTES mapping.
+str: schema_pb2.AtomicType.STRING,
 
 Review comment:
   Thinking about this some more, what about just rejecting str in Python 2 
(forcing the user to say unicode). We can loosen things if this becomes too 
cumbersome in the future (but going the other way is backwards incompatible). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=300523=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300523
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 23/Aug/19 21:34
Start Date: 23/Aug/19 21:34
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-524469704
 
 
   This is a subclass of schema coder.
   
   Ideally we should get rid of the current row coder (make it a utility
   class) and call this RowCoder.
   
   On Fri, Aug 23, 2019, 1:32 PM Ahmet Altay  wrote:
   
   > Can we just SchemaCoder as the name? I agree that this does not matter
   > much. Would changing this in the future will be difficult?
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > 
,
   > or mute the thread
   > 

   > .
   >
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300523)
Time Spent: 8h 50m  (was: 8h 40m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=300494=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300494
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 23/Aug/19 20:32
Start Date: 23/Aug/19 20:32
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9188: [BEAM-7886] Make row 
coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-524452566
 
 
   Can we just SchemaCoder as the name? I agree that this does not matter much. 
Would changing this in the future will be difficult?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300494)
Time Spent: 8h 40m  (was: 8.5h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=299826=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299826
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 23/Aug/19 00:13
Start Date: 23/Aug/19 00:13
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-524124310
 
 
   Trying to think of a better name than PortableSchemaCoder, but I guess this 
is fine for now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299826)
Time Spent: 8.5h  (was: 8h 20m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=299742=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299742
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 22/Aug/19 20:44
Start Date: 22/Aug/19 20:44
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316877069
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder_test.py
 ##
 @@ -0,0 +1,129 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+from __future__ import absolute_import
+
+import logging
+import typing
+import unittest
+from itertools import chain
+
+import numpy as np
+from past.builtins import unicode
+
+from apache_beam.coders import RowCoder
+from apache_beam.coders.typecoders import registry as coders_registry
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import typing_to_runner_api
+
+Person = typing.NamedTuple("Person", [
+("name", unicode),
+("age", np.int32),
+("address", typing.Optional[unicode]),
+("aliases", typing.List[unicode]),
+])
+
+coders_registry.register_coder(Person, RowCoder)
+
+
+class CodersTest(unittest.TestCase):
+  TEST_CASES = [
+  Person("Jon Snow", 23, None, ["crow", "wildling"]),
+  Person("Daenerys Targaryen", 25, "Westeros", ["Mother of Dragons"]),
+  Person("Michael Bluth", 30, None, [])
+  ]
+
+  def test_create_row_coder_from_named_tuple(self):
+expected_coder = RowCoder(typing_to_runner_api(Person).row_type.schema)
+real_coder = coders_registry.get_coder(Person)
+
+for test_case in self.TEST_CASES:
+  self.assertEqual(
+  expected_coder.encode(test_case), real_coder.encode(test_case))
+
+  self.assertEqual(test_case,
+   real_coder.decode(real_coder.encode(test_case)))
+
+  def test_create_row_coder_from_schema(self):
+schema = schema_pb2.Schema(
+id="person",
+fields=[
+schema_pb2.Field(
+name="name",
+type=schema_pb2.FieldType(
+atomic_type=schema_pb2.AtomicType.STRING)),
+schema_pb2.Field(
+name="age",
+type=schema_pb2.FieldType(
+atomic_type=schema_pb2.AtomicType.INT32)),
+schema_pb2.Field(
+name="address",
+type=schema_pb2.FieldType(
+atomic_type=schema_pb2.AtomicType.STRING, nullable=True)),
+schema_pb2.Field(
+name="aliases",
+type=schema_pb2.FieldType(
+array_type=schema_pb2.ArrayType(
+element_type=schema_pb2.FieldType(
+atomic_type=schema_pb2.AtomicType.STRING,
+])
+coder = RowCoder(schema)
+
+for test_case in self.TEST_CASES:
+  self.assertEqual(test_case, coder.decode(coder.encode(test_case)))
+
+  @unittest.skip(
+  "Need to decide whether to defer to the stream writer for these checks "
+  "or add explicit checks"
+  )
+  def test_overflows(self):
+IntTester = typing.NamedTuple('IntTester', [
+#('i8', typing.Optional[np.int8]),
 
 Review comment:
   Added. Also added a reference to 
[BEAM-8030](https://issues.apache.org/jira/browse/BEAM-8030) in the skip 
message.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299742)
Time Spent: 8h 10m  (was: 8h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
>

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=299744=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299744
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 22/Aug/19 20:44
Start Date: 22/Aug/19 20:44
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316877261
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder_test.py
 ##
 @@ -0,0 +1,126 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+from __future__ import absolute_import
+
+import logging
+import typing
+import unittest
+
+import numpy as np
+from itertools import chain
+from past.builtins import unicode
+
+from apache_beam.coders import RowCoder
+from apache_beam.coders.typecoders import registry as coders_registry
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import typing_to_runner_api
+
+Person = typing.NamedTuple("Person", [
+("name", unicode),
+("age", np.int32),
+("address", typing.Optional[unicode]),
+("aliases", typing.List[unicode]),
+])
+
+coders_registry.register_coder(Person, RowCoder)
+
+
+class CodersTest(unittest.TestCase):
+  TEST_CASES = [
+  Person("Jon Snow", 23, None, ["crow", "wildling"]),
+  Person("Daenerys Targaryen", 25, "Westeros", ["Mother of Dragons"]),
+  Person("Michael Bluth", 30, None, [])
+  ]
+
+  def test_create_row_coder_from_named_tuple(self):
+expected_coder = RowCoder(typing_to_runner_api(Person).row_type.schema)
+real_coder = coders_registry.get_coder(Person)
+
+for test_case in self.TEST_CASES:
+  self.assertEqual(
+  expected_coder.encode(test_case), real_coder.encode(test_case))
+
+  self.assertEqual(test_case,
+   real_coder.decode(real_coder.encode(test_case)))
+
+  def test_create_row_coder_from_schema(self):
+schema = schema_pb2.Schema(
+id="person",
+fields=[
+schema_pb2.Field(
+name="name",
+type=schema_pb2.FieldType(
+atomic_type=schema_pb2.AtomicType.STRING)),
+schema_pb2.Field(
+name="age",
+type=schema_pb2.FieldType(
+atomic_type=schema_pb2.AtomicType.INT32)),
+schema_pb2.Field(
+name="address",
+type=schema_pb2.FieldType(
+atomic_type=schema_pb2.AtomicType.STRING, nullable=True)),
+schema_pb2.Field(
+name="aliases",
+type=schema_pb2.FieldType(
+array_type=schema_pb2.ArrayType(
+element_type=schema_pb2.FieldType(
+atomic_type=schema_pb2.AtomicType.STRING,
+])
+coder = RowCoder(schema)
+
+for test_case in self.TEST_CASES:
+  self.assertEqual(test_case, coder.decode(coder.encode(test_case)))
+
+  @unittest.skip("Need to decide whether to defer to the stream writer for 
these checks or add explicit checks")
 
 Review comment:
   Filed [BEAM-8030](https://issues.apache.org/jira/browse/BEAM-8030) to 
reconcile this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299744)
Time Spent: 8h 20m  (was: 8h 10m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=299736=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299736
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 22/Aug/19 20:39
Start Date: 22/Aug/19 20:39
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316875134
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -115,8 +115,7 @@ def get_version():
 'mock>=1.0.1,<3.0.0',
 'pymongo>=3.8.0,<4.0.0',
 'oauth2client>=2.0.1,<4',
-# grpcio 1.8.1 and above requires protobuf 3.5.0.post1.
-'protobuf>=3.5.0.post1,<4',
+'protobuf>=3.8.0.post1,<4',
 
 Review comment:
   Sounds good, I just moved the numpy dependency from test to required with 
the same range. Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299736)
Time Spent: 8h  (was: 7h 50m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 8h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=299101=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299101
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 22/Aug/19 01:01
Start Date: 22/Aug/19 01:01
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316460412
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -115,8 +115,7 @@ def get_version():
 'mock>=1.0.1,<3.0.0',
 'pymongo>=3.8.0,<4.0.0',
 'oauth2client>=2.0.1,<4',
-# grpcio 1.8.1 and above requires protobuf 3.5.0.post1.
-'protobuf>=3.5.0.post1,<4',
+'protobuf>=3.8.0.post1,<4',
 
 Review comment:
   Re: numpy
   
   That is ok. It needs to be an explicit dependency at this point really.
   
   you can remove numpy dependency under test packages and please try to keep 
the same range.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299101)
Time Spent: 7h 50m  (was: 7h 40m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=299093=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299093
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 22/Aug/19 00:15
Start Date: 22/Aug/19 00:15
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316452464
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -115,8 +115,7 @@ def get_version():
 'mock>=1.0.1,<3.0.0',
 'pymongo>=3.8.0,<4.0.0',
 'oauth2client>=2.0.1,<4',
-# grpcio 1.8.1 and above requires protobuf 3.5.0.post1.
-'protobuf>=3.5.0.post1,<4',
+'protobuf>=3.8.0.post1,<4',
 
 Review comment:
   :+1: done
   
   I also added a dependency on numpy. It shouldn't change anything since we 
already had the dependency transitively through pyarrow (and perhaps others?)
   
   Is that ok?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299093)
Time Spent: 7h 40m  (was: 7.5h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=299088=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299088
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 22/Aug/19 00:05
Start Date: 22/Aug/19 00:05
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316450487
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder.py
 ##
 @@ -0,0 +1,162 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import itertools
+from array import array
+
+from apache_beam.coders.coder_impl import StreamCoderImpl
+from apache_beam.coders.coders import BytesCoder
+from apache_beam.coders.coders import Coder
+from apache_beam.coders.coders import FastCoder
+from apache_beam.coders.coders import FloatCoder
+from apache_beam.coders.coders import IterableCoder
+from apache_beam.coders.coders import StrUtf8Coder
+from apache_beam.coders.coders import TupleCoder
+from apache_beam.coders.coders import VarIntCoder
+from apache_beam.portability import common_urns
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+
+__all__ = ["RowCoder"]
+
+
+class RowCoder(FastCoder):
+  """ Coder for `typing.NamedTuple` instances.
+
+  Implements the beam:coder:row:v1 standard coder spec.
+  """
+
+  def __init__(self, schema):
+self.schema = schema
+self.components = [
+coder_from_type(field.type) for field in self.schema.fields
+]
+
+  def _create_impl(self):
+return RowCoderImpl(self.schema, self.components)
+
+  def is_deterministic(self):
+return all(c.is_deterministic() for c in self.components)
+
+  def to_type_hint(self):
+return named_tuple_from_schema(self.schema)
+
+  def as_cloud_object(self, coders_context=None):
+raise NotImplementedError("TODO")
+
+  def __eq__(self, other):
+return type(self) == type(other) and self.schema == other.schema
+
+  def __hash__(self):
+return hash((type(self), self.schema.SerializePartialToString()))
+
+  def to_runner_api_parameter(self, unused_context):
+return (common_urns.coders.ROW.urn, self.schema, [])
+
+  @staticmethod
+  def from_type_hint(named_tuple_type, registry):
+return RowCoder(named_tuple_to_schema(named_tuple_type))
+
+
+def coder_from_type(type_):
+  type_info = type_.WhichOneof("type_info")
+  if type_info == "atomic_type":
+if type_.atomic_type in (schema_pb2.AtomicType.INT32,
+ schema_pb2.AtomicType.INT64):
+  return VarIntCoder()
+elif type_.atomic_type == schema_pb2.AtomicType.DOUBLE:
+  return FloatCoder()
+elif type_.atomic_type == schema_pb2.AtomicType.STRING:
+  return StrUtf8Coder()
+  elif type_info == "array_type":
+return IterableCoder(coder_from_type(type_.array_type.element_type))
+
+  # The Java SDK supports several more types, but the coders are not yet
+  # standard, and are not implemented in Python.
+  raise ValueError(
+  "Encountered a type that is not currently supported by RowCoder: %s" %
+  type_)
+
+
+# pylint: disable=unused-variable
+
+@Coder.register_urn(common_urns.coders.ROW.urn, schema_pb2.Schema)
+def from_runner_api_parameter(payload, components, unused_context):
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299088)
Time Spent: 7.5h  (was: 7h 20m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
>  

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=299087=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299087
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 22/Aug/19 00:04
Start Date: 22/Aug/19 00:04
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316450339
 
 

 ##
 File path: sdks/python/apache_beam/typehints/schemas.py
 ##
 @@ -0,0 +1,217 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+""" Support for mapping python types to proto Schemas and back again.
+
+Python  Schema
+np.int8 <-> BYTE
+np.int16<-> INT16
+np.int32<-> INT32
+np.int64<-> INT64
+int ---/
+np.float32  <-> FLOAT
+np.float64  <-> DOUBLE
+float   ---/
+bool<-> BOOLEAN
+
+The mappings for STRING and BYTES are different between python 2 and python 3,
+because of the changes to str:
+py3:
+str/unicode <-> STRING
+bytes   <-> BYTES
+ByteString  ---/
+
+py2:
+unicode <-> STRING
+str/bytes   ---/
+ByteString  <-> BYTES
+"""
+
+from __future__ import absolute_import
+
+import sys
+from typing import ByteString
+from typing import Mapping
+from typing import NamedTuple
+from typing import Optional
+from typing import Sequence
+from uuid import uuid4
+
+import numpy as np
+from past.builtins import unicode
+
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.native_type_compatibility import _get_args
+from apache_beam.typehints.native_type_compatibility import 
_match_is_exactly_mapping
+from apache_beam.typehints.native_type_compatibility import 
_match_is_named_tuple
+from apache_beam.typehints.native_type_compatibility import _match_is_optional
+from apache_beam.typehints.native_type_compatibility import _safe_issubclass
+from apache_beam.typehints.native_type_compatibility import 
extract_optional_type
+
+
+# Registry of typings for a schema by UUID
+class SchemaTypeRegistry(object):
+  def __init__(self):
+self.by_id = {}
+self.by_typing = {}
+
+  def add(self, typing, schema):
+self.by_id[schema.id] = (typing, schema)
+
+  def get_typing_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[0] if result is not None else None
+
+  def get_schema_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[1] if result is not None else None
+
+
+SCHEMA_REGISTRY = SchemaTypeRegistry()
+
+
+# Bi-directional mappings
+_PRIMITIVES = (
+(np.int8, schema_pb2.AtomicType.BYTE),
+(np.int16, schema_pb2.AtomicType.INT16),
+(np.int32, schema_pb2.AtomicType.INT32),
+(np.int64, schema_pb2.AtomicType.INT64),
+(np.float32, schema_pb2.AtomicType.FLOAT),
+(np.float64, schema_pb2.AtomicType.DOUBLE),
+(unicode, schema_pb2.AtomicType.STRING),
+(bool, schema_pb2.AtomicType.BOOLEAN),
+(bytes if sys.version_info.major >= 3 else ByteString,
+ schema_pb2.AtomicType.BYTES),
+)
+
+PRIMITIVE_TO_ATOMIC_TYPE = dict((typ, atomic) for typ, atomic in _PRIMITIVES)
+ATOMIC_TYPE_TO_PRIMITIVE = dict((atomic, typ) for typ, atomic in _PRIMITIVES)
+
+# One-way mappings
+PRIMITIVE_TO_ATOMIC_TYPE.update({
+# In python 3, this is a no-op because str == unicode,
+# but in python 2 it overrides the bytes -> BYTES mapping.
+str: schema_pb2.AtomicType.STRING,
+# In python 2, this is a no-op because we define it as the bi-directional
+# mapping above. This just ensures the one-way mapping is defined in python
+# 3.
+ByteString: schema_pb2.AtomicType.BYTES,
+# Allow users to specify a native int, and use INT64 as the cross-language
+# representation. Technically ints have unlimited precision, but RowCoder
+# should throw an error if it sees one with a bit width > 64 when encoding.
+int: schema_pb2.AtomicType.INT64,
+float: schema_pb2.AtomicType.DOUBLE,
+})
+
+
+def typing_to_runner_api(type_):
+  if 

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=299085=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299085
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 22/Aug/19 00:03
Start Date: 22/Aug/19 00:03
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316450190
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder.py
 ##
 @@ -0,0 +1,162 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import itertools
+from array import array
+
+from apache_beam.coders.coder_impl import StreamCoderImpl
+from apache_beam.coders.coders import BytesCoder
+from apache_beam.coders.coders import Coder
+from apache_beam.coders.coders import FastCoder
+from apache_beam.coders.coders import FloatCoder
+from apache_beam.coders.coders import IterableCoder
+from apache_beam.coders.coders import StrUtf8Coder
+from apache_beam.coders.coders import TupleCoder
+from apache_beam.coders.coders import VarIntCoder
+from apache_beam.portability import common_urns
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+
+__all__ = ["RowCoder"]
+
+
+class RowCoder(FastCoder):
+  """ Coder for `typing.NamedTuple` instances.
+
+  Implements the beam:coder:row:v1 standard coder spec.
+  """
+
+  def __init__(self, schema):
+self.schema = schema
+self.components = [
+coder_from_type(field.type) for field in self.schema.fields
+]
+
+  def _create_impl(self):
+return RowCoderImpl(self.schema, self.components)
+
+  def is_deterministic(self):
+return all(c.is_deterministic() for c in self.components)
+
+  def to_type_hint(self):
+return named_tuple_from_schema(self.schema)
+
+  def as_cloud_object(self, coders_context=None):
+raise NotImplementedError("TODO")
+
+  def __eq__(self, other):
+return type(self) == type(other) and self.schema == other.schema
+
+  def __hash__(self):
+return hash((type(self), self.schema.SerializePartialToString()))
+
+  def to_runner_api_parameter(self, unused_context):
+return (common_urns.coders.ROW.urn, self.schema, [])
+
+  @staticmethod
+  def from_type_hint(named_tuple_type, registry):
+return RowCoder(named_tuple_to_schema(named_tuple_type))
+
+
+def coder_from_type(type_):
+  type_info = type_.WhichOneof("type_info")
+  if type_info == "atomic_type":
+if type_.atomic_type in (schema_pb2.AtomicType.INT32,
+ schema_pb2.AtomicType.INT64):
+  return VarIntCoder()
+elif type_.atomic_type == schema_pb2.AtomicType.DOUBLE:
+  return FloatCoder()
+elif type_.atomic_type == schema_pb2.AtomicType.STRING:
+  return StrUtf8Coder()
+  elif type_info == "array_type":
+return IterableCoder(coder_from_type(type_.array_type.element_type))
+
+  # The Java SDK supports several more types, but the coders are not yet
+  # standard, and are not implemented in Python.
+  raise ValueError(
+  "Encountered a type that is not currently supported by RowCoder: %s" %
+  type_)
+
+
+# pylint: disable=unused-variable
 
 Review comment:
   Whoops - this was a copy-paste error that I think was getting obcured by my 
code folding? Regardless it's fixed now, thanks
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299085)
Time Spent: 7h 10m  (was: 7h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: 

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=299057=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299057
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 21/Aug/19 22:56
Start Date: 21/Aug/19 22:56
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316435823
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder.py
 ##
 @@ -0,0 +1,162 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import itertools
+from array import array
+
+from apache_beam.coders.coder_impl import StreamCoderImpl
+from apache_beam.coders.coders import BytesCoder
+from apache_beam.coders.coders import Coder
+from apache_beam.coders.coders import FastCoder
+from apache_beam.coders.coders import FloatCoder
+from apache_beam.coders.coders import IterableCoder
+from apache_beam.coders.coders import StrUtf8Coder
+from apache_beam.coders.coders import TupleCoder
+from apache_beam.coders.coders import VarIntCoder
+from apache_beam.portability import common_urns
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+
+__all__ = ["RowCoder"]
+
+
+class RowCoder(FastCoder):
+  """ Coder for `typing.NamedTuple` instances.
+
+  Implements the beam:coder:row:v1 standard coder spec.
+  """
+
+  def __init__(self, schema):
+self.schema = schema
+self.components = [
+coder_from_type(field.type) for field in self.schema.fields
+]
+
+  def _create_impl(self):
+return RowCoderImpl(self.schema, self.components)
+
+  def is_deterministic(self):
+return all(c.is_deterministic() for c in self.components)
+
+  def to_type_hint(self):
+return named_tuple_from_schema(self.schema)
+
+  def as_cloud_object(self, coders_context=None):
+raise NotImplementedError("TODO")
+
+  def __eq__(self, other):
+return type(self) == type(other) and self.schema == other.schema
+
+  def __hash__(self):
 
 Review comment:
   Yeah the reason is that self.schema is isn't hashable. I tried to follow the 
model of the structured coders that hash their element coders, but I had to 
serialize the schema to make it work.
   
   It will be hashable whether this is defined or not though since `Coder` 
includes:
   
   ```
   def __hash__(self):
 return hash(type(self))
   ```
   
   I suppose it's fine to just let the collisions happen and rely on `__eq__` 
though? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299057)
Time Spent: 6h 50m  (was: 6h 40m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=299056=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299056
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 21/Aug/19 22:56
Start Date: 21/Aug/19 22:56
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316435823
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder.py
 ##
 @@ -0,0 +1,162 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import itertools
+from array import array
+
+from apache_beam.coders.coder_impl import StreamCoderImpl
+from apache_beam.coders.coders import BytesCoder
+from apache_beam.coders.coders import Coder
+from apache_beam.coders.coders import FastCoder
+from apache_beam.coders.coders import FloatCoder
+from apache_beam.coders.coders import IterableCoder
+from apache_beam.coders.coders import StrUtf8Coder
+from apache_beam.coders.coders import TupleCoder
+from apache_beam.coders.coders import VarIntCoder
+from apache_beam.portability import common_urns
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+
+__all__ = ["RowCoder"]
+
+
+class RowCoder(FastCoder):
+  """ Coder for `typing.NamedTuple` instances.
+
+  Implements the beam:coder:row:v1 standard coder spec.
+  """
+
+  def __init__(self, schema):
+self.schema = schema
+self.components = [
+coder_from_type(field.type) for field in self.schema.fields
+]
+
+  def _create_impl(self):
+return RowCoderImpl(self.schema, self.components)
+
+  def is_deterministic(self):
+return all(c.is_deterministic() for c in self.components)
+
+  def to_type_hint(self):
+return named_tuple_from_schema(self.schema)
+
+  def as_cloud_object(self, coders_context=None):
+raise NotImplementedError("TODO")
+
+  def __eq__(self, other):
+return type(self) == type(other) and self.schema == other.schema
+
+  def __hash__(self):
 
 Review comment:
   Yeah the reason is that self.schema is isn't hashable. I tried to follow the 
model of the structured coders that hash their element coders, but I had to 
serialize the schema to make it work.
   
   It will be hashable whether this is defined or not though since `Coder` 
includes:
   
   ```
   def __hash__(self):
 return hash(type(self))
   ```
   
   I suppose it's fine to just let the collisions happen and rely on __eq__ 
though? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299056)
Time Spent: 6h 40m  (was: 6.5h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=299058=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299058
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 21/Aug/19 22:56
Start Date: 21/Aug/19 22:56
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316435823
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder.py
 ##
 @@ -0,0 +1,162 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import itertools
+from array import array
+
+from apache_beam.coders.coder_impl import StreamCoderImpl
+from apache_beam.coders.coders import BytesCoder
+from apache_beam.coders.coders import Coder
+from apache_beam.coders.coders import FastCoder
+from apache_beam.coders.coders import FloatCoder
+from apache_beam.coders.coders import IterableCoder
+from apache_beam.coders.coders import StrUtf8Coder
+from apache_beam.coders.coders import TupleCoder
+from apache_beam.coders.coders import VarIntCoder
+from apache_beam.portability import common_urns
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+
+__all__ = ["RowCoder"]
+
+
+class RowCoder(FastCoder):
+  """ Coder for `typing.NamedTuple` instances.
+
+  Implements the beam:coder:row:v1 standard coder spec.
+  """
+
+  def __init__(self, schema):
+self.schema = schema
+self.components = [
+coder_from_type(field.type) for field in self.schema.fields
+]
+
+  def _create_impl(self):
+return RowCoderImpl(self.schema, self.components)
+
+  def is_deterministic(self):
+return all(c.is_deterministic() for c in self.components)
+
+  def to_type_hint(self):
+return named_tuple_from_schema(self.schema)
+
+  def as_cloud_object(self, coders_context=None):
+raise NotImplementedError("TODO")
+
+  def __eq__(self, other):
+return type(self) == type(other) and self.schema == other.schema
+
+  def __hash__(self):
 
 Review comment:
   Yeah the reason is that self.schema is isn't hashable. I tried to follow the 
model of the structured coders that hash their element coders, but I had to 
serialize the schema to make it work.
   
   It will be hashable whether this is defined or not though since `Coder` 
includes:
   
   ```python
   def __hash__(self):
 return hash(type(self))
   ```
   
   I suppose it's fine to just let the collisions happen and rely on `__eq__` 
though? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299058)
Time Spent: 7h  (was: 6h 50m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=299042=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299042
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 21/Aug/19 22:11
Start Date: 21/Aug/19 22:11
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316424584
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/PortableSchemaCoder.java
 ##
 @@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas;
+
+import org.apache.beam.sdk.transforms.SerializableFunctions;
+import org.apache.beam.sdk.values.Row;
+
+/**
+ * A version of SchemaCoder that can only produce/consume Row instances.
+ *
+ * Implements the beam:coders:row:v1 standard coder while still satisfying 
the requirement that a
+ * PCollection is only considered to have a schema if its coder is an instance 
of SchemaCoder.
+ */
+public class PortableSchemaCoder extends SchemaCoder {
 
 Review comment:
   Yeah, I need a unique class for the map in CoderTranslation. I originally 
tried to use `RowCoder` for this directly, but that didn't work since it needs 
to be a sub-class of SchemaCoder for PCollection.hasSchema. 
   
   This was the easiest solution I came up with to make it work, but I'm 
definitely open to other ideas. Maybe I could re-work the CoderTranslation 
logic so that `SchemaCoder` instances translate to `beam:coder:row:v1`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299042)
Time Spent: 6.5h  (was: 6h 20m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=298998=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-298998
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 21/Aug/19 20:44
Start Date: 21/Aug/19 20:44
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316393831
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/PortableSchemaCoder.java
 ##
 @@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas;
+
+import org.apache.beam.sdk.transforms.SerializableFunctions;
+import org.apache.beam.sdk.values.Row;
+
+/**
+ * A version of SchemaCoder that can only produce/consume Row instances.
+ *
+ * Implements the beam:coders:row:v1 standard coder while still satisfying 
the requirement that a
+ * PCollection is only considered to have a schema if its coder is an instance 
of SchemaCoder.
+ */
+public class PortableSchemaCoder extends SchemaCoder {
 
 Review comment:
   is the reason you need a separate class here because you need a unique Class 
object for the map?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 298998)
Time Spent: 6h 20m  (was: 6h 10m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=298997=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-298997
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 21/Aug/19 20:42
Start Date: 21/Aug/19 20:42
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316393193
 
 

 ##
 File path: sdks/python/apache_beam/typehints/schemas.py
 ##
 @@ -0,0 +1,217 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+""" Support for mapping python types to proto Schemas and back again.
+
+Python  Schema
+np.int8 <-> BYTE
+np.int16<-> INT16
+np.int32<-> INT32
+np.int64<-> INT64
+int ---/
+np.float32  <-> FLOAT
+np.float64  <-> DOUBLE
+float   ---/
+bool<-> BOOLEAN
+
+The mappings for STRING and BYTES are different between python 2 and python 3,
+because of the changes to str:
+py3:
+str/unicode <-> STRING
+bytes   <-> BYTES
+ByteString  ---/
+
+py2:
+unicode <-> STRING
+str/bytes   ---/
+ByteString  <-> BYTES
+"""
+
+from __future__ import absolute_import
+
+import sys
+from typing import ByteString
+from typing import Mapping
+from typing import NamedTuple
+from typing import Optional
+from typing import Sequence
+from uuid import uuid4
+
+import numpy as np
+from past.builtins import unicode
+
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.native_type_compatibility import _get_args
+from apache_beam.typehints.native_type_compatibility import 
_match_is_exactly_mapping
+from apache_beam.typehints.native_type_compatibility import 
_match_is_named_tuple
+from apache_beam.typehints.native_type_compatibility import _match_is_optional
+from apache_beam.typehints.native_type_compatibility import _safe_issubclass
+from apache_beam.typehints.native_type_compatibility import 
extract_optional_type
+
+
+# Registry of typings for a schema by UUID
+class SchemaTypeRegistry(object):
+  def __init__(self):
+self.by_id = {}
+self.by_typing = {}
+
+  def add(self, typing, schema):
+self.by_id[schema.id] = (typing, schema)
+
+  def get_typing_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[0] if result is not None else None
+
+  def get_schema_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[1] if result is not None else None
+
+
+SCHEMA_REGISTRY = SchemaTypeRegistry()
+
+
+# Bi-directional mappings
+_PRIMITIVES = (
+(np.int8, schema_pb2.AtomicType.BYTE),
+(np.int16, schema_pb2.AtomicType.INT16),
+(np.int32, schema_pb2.AtomicType.INT32),
+(np.int64, schema_pb2.AtomicType.INT64),
+(np.float32, schema_pb2.AtomicType.FLOAT),
+(np.float64, schema_pb2.AtomicType.DOUBLE),
+(unicode, schema_pb2.AtomicType.STRING),
+(bool, schema_pb2.AtomicType.BOOLEAN),
+(bytes if sys.version_info.major >= 3 else ByteString,
+ schema_pb2.AtomicType.BYTES),
+)
+
+PRIMITIVE_TO_ATOMIC_TYPE = dict((typ, atomic) for typ, atomic in _PRIMITIVES)
+ATOMIC_TYPE_TO_PRIMITIVE = dict((atomic, typ) for typ, atomic in _PRIMITIVES)
+
+# One-way mappings
+PRIMITIVE_TO_ATOMIC_TYPE.update({
+# In python 3, this is a no-op because str == unicode,
+# but in python 2 it overrides the bytes -> BYTES mapping.
+str: schema_pb2.AtomicType.STRING,
 
 Review comment:
   So the runtime types don't necessarily line up exactly with the typing 
representation of the schema. For example even though a schema may have an 
attribute with `np.int*` type, we still actually produce and consume `int` 
instances, and never use `np.int*` instances at runtime.
   
   In this case, the typing might say `str`, but RowCoder uses StrUtf8Coder to 
produce/consume instances of `past.builtins.unicode` at runtime.
   
   I agree this could be a little confusing for users. We discussed it [on the 

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=298991=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-298991
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 21/Aug/19 20:31
Start Date: 21/Aug/19 20:31
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316388296
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -115,8 +115,7 @@ def get_version():
 'mock>=1.0.1,<3.0.0',
 'pymongo>=3.8.0,<4.0.0',
 'oauth2client>=2.0.1,<4',
-# grpcio 1.8.1 and above requires protobuf 3.5.0.post1.
-'protobuf>=3.5.0.post1,<4',
+'protobuf>=3.8.0.post1,<4',
 
 Review comment:
   If it would not be a difficult fix, or would not cause future issues, it is 
better to keep the version range as is.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 298991)
Time Spent: 6h  (was: 5h 50m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=298987=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-298987
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 21/Aug/19 20:28
Start Date: 21/Aug/19 20:28
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316387149
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder_test.py
 ##
 @@ -0,0 +1,129 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+from __future__ import absolute_import
+
+import logging
+import typing
+import unittest
+from itertools import chain
+
+import numpy as np
+from past.builtins import unicode
+
+from apache_beam.coders import RowCoder
+from apache_beam.coders.typecoders import registry as coders_registry
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import typing_to_runner_api
+
+Person = typing.NamedTuple("Person", [
+("name", unicode),
+("age", np.int32),
+("address", typing.Optional[unicode]),
+("aliases", typing.List[unicode]),
+])
+
+coders_registry.register_coder(Person, RowCoder)
 
 Review comment:
   Yes it is right now. I was hesitant to make RowCoder the default coder for 
any NamedTuple sub-class instance, and I thought this was a good way to make it 
opt in.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 298987)
Time Spent: 5h 40m  (was: 5.5h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=298988=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-298988
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 21/Aug/19 20:28
Start Date: 21/Aug/19 20:28
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316387149
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder_test.py
 ##
 @@ -0,0 +1,129 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+from __future__ import absolute_import
+
+import logging
+import typing
+import unittest
+from itertools import chain
+
+import numpy as np
+from past.builtins import unicode
+
+from apache_beam.coders import RowCoder
+from apache_beam.coders.typecoders import registry as coders_registry
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import typing_to_runner_api
+
+Person = typing.NamedTuple("Person", [
+("name", unicode),
+("age", np.int32),
+("address", typing.Optional[unicode]),
+("aliases", typing.List[unicode]),
+])
+
+coders_registry.register_coder(Person, RowCoder)
 
 Review comment:
   Yes it is right now. I was hesitant to make RowCoder the default coder for 
any NamedTuple sub-class, and I thought this was a good way to make it opt in.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 298988)
Time Spent: 5h 50m  (was: 5h 40m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=298986=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-298986
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 21/Aug/19 20:25
Start Date: 21/Aug/19 20:25
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316386018
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -115,8 +115,7 @@ def get_version():
 'mock>=1.0.1,<3.0.0',
 'pymongo>=3.8.0,<4.0.0',
 'oauth2client>=2.0.1,<4',
-# grpcio 1.8.1 and above requires protobuf 3.5.0.post1.
-'protobuf>=3.5.0.post1,<4',
+'protobuf>=3.8.0.post1,<4',
 
 Review comment:
   I bumped it because I used a 3.8.0 feature without realizing it (using 
`schema_pb2.AtomicType.STRING`, rather than `schema_pb2.STRING`). This is 
actually the bug I was referencing in 
https://lists.apache.org/thread.html/e174967bab6c224db40f66af74e7d07466812cc6b78189e55da88cb0@%3Cdev.beam.apache.org%3E
   
   If it's going to be a problem I can change it to back and update my code, 
bumping it to 3.8.0 was just the lazy fix

 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 298986)
Time Spent: 5.5h  (was: 5h 20m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=298970=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-298970
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 21/Aug/19 20:08
Start Date: 21/Aug/19 20:08
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316378981
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -115,8 +115,7 @@ def get_version():
 'mock>=1.0.1,<3.0.0',
 'pymongo>=3.8.0,<4.0.0',
 'oauth2client>=2.0.1,<4',
-# grpcio 1.8.1 and above requires protobuf 3.5.0.post1.
-'protobuf>=3.5.0.post1,<4',
+'protobuf>=3.8.0.post1,<4',
 
 Review comment:
   > @aaltay do you think this might be an issue?
   Upping the lowerbound for protobuf should be ok. @TheNeuralBit what is the 
reason for this change? Does something require this newer version? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 298970)
Time Spent: 5h 10m  (was: 5h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=298971=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-298971
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 21/Aug/19 20:08
Start Date: 21/Aug/19 20:08
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316378981
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -115,8 +115,7 @@ def get_version():
 'mock>=1.0.1,<3.0.0',
 'pymongo>=3.8.0,<4.0.0',
 'oauth2client>=2.0.1,<4',
-# grpcio 1.8.1 and above requires protobuf 3.5.0.post1.
-'protobuf>=3.5.0.post1,<4',
+'protobuf>=3.8.0.post1,<4',
 
 Review comment:
   > @aaltay do you think this might be an issue?
   
   Upping the lowerbound for protobuf should be ok. @TheNeuralBit what is the 
reason for this change? Does something require this newer version? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 298971)
Time Spent: 5h 20m  (was: 5h 10m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=298928=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-298928
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 21/Aug/19 19:23
Start Date: 21/Aug/19 19:23
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r315958314
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder_test.py
 ##
 @@ -0,0 +1,129 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+from __future__ import absolute_import
+
+import logging
+import typing
+import unittest
+from itertools import chain
+
+import numpy as np
+from past.builtins import unicode
+
+from apache_beam.coders import RowCoder
+from apache_beam.coders.typecoders import registry as coders_registry
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import typing_to_runner_api
+
+Person = typing.NamedTuple("Person", [
+("name", unicode),
+("age", np.int32),
+("address", typing.Optional[unicode]),
+("aliases", typing.List[unicode]),
+])
+
+coders_registry.register_coder(Person, RowCoder)
+
+
+class CodersTest(unittest.TestCase):
+  TEST_CASES = [
+  Person("Jon Snow", 23, None, ["crow", "wildling"]),
+  Person("Daenerys Targaryen", 25, "Westeros", ["Mother of Dragons"]),
+  Person("Michael Bluth", 30, None, [])
+  ]
+
+  def test_create_row_coder_from_named_tuple(self):
+expected_coder = RowCoder(typing_to_runner_api(Person).row_type.schema)
+real_coder = coders_registry.get_coder(Person)
+
+for test_case in self.TEST_CASES:
+  self.assertEqual(
+  expected_coder.encode(test_case), real_coder.encode(test_case))
+
+  self.assertEqual(test_case,
+   real_coder.decode(real_coder.encode(test_case)))
+
+  def test_create_row_coder_from_schema(self):
+schema = schema_pb2.Schema(
+id="person",
+fields=[
+schema_pb2.Field(
+name="name",
+type=schema_pb2.FieldType(
+atomic_type=schema_pb2.AtomicType.STRING)),
+schema_pb2.Field(
+name="age",
+type=schema_pb2.FieldType(
+atomic_type=schema_pb2.AtomicType.INT32)),
+schema_pb2.Field(
+name="address",
+type=schema_pb2.FieldType(
+atomic_type=schema_pb2.AtomicType.STRING, nullable=True)),
+schema_pb2.Field(
+name="aliases",
+type=schema_pb2.FieldType(
+array_type=schema_pb2.ArrayType(
+element_type=schema_pb2.FieldType(
+atomic_type=schema_pb2.AtomicType.STRING,
+])
+coder = RowCoder(schema)
+
+for test_case in self.TEST_CASES:
+  self.assertEqual(test_case, coder.decode(coder.encode(test_case)))
+
+  @unittest.skip(
+  "Need to decide whether to defer to the stream writer for these checks "
+  "or add explicit checks"
+  )
+  def test_overflows(self):
+IntTester = typing.NamedTuple('IntTester', [
+#('i8', typing.Optional[np.int8]),
 
 Review comment:
   Should these have a TODO(BEAM-7996) prefix?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 298928)
Time Spent: 4h 40m  (was: 4.5h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: 

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=298932=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-298932
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 21/Aug/19 19:23
Start Date: 21/Aug/19 19:23
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316354443
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder.py
 ##
 @@ -0,0 +1,162 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import itertools
+from array import array
+
+from apache_beam.coders.coder_impl import StreamCoderImpl
+from apache_beam.coders.coders import BytesCoder
+from apache_beam.coders.coders import Coder
+from apache_beam.coders.coders import FastCoder
+from apache_beam.coders.coders import FloatCoder
+from apache_beam.coders.coders import IterableCoder
+from apache_beam.coders.coders import StrUtf8Coder
+from apache_beam.coders.coders import TupleCoder
+from apache_beam.coders.coders import VarIntCoder
+from apache_beam.portability import common_urns
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+
+__all__ = ["RowCoder"]
+
+
+class RowCoder(FastCoder):
+  """ Coder for `typing.NamedTuple` instances.
+
+  Implements the beam:coder:row:v1 standard coder spec.
+  """
+
+  def __init__(self, schema):
+self.schema = schema
+self.components = [
+coder_from_type(field.type) for field in self.schema.fields
+]
+
+  def _create_impl(self):
+return RowCoderImpl(self.schema, self.components)
+
+  def is_deterministic(self):
+return all(c.is_deterministic() for c in self.components)
+
+  def to_type_hint(self):
+return named_tuple_from_schema(self.schema)
+
+  def as_cloud_object(self, coders_context=None):
+raise NotImplementedError("TODO")
+
+  def __eq__(self, other):
+return type(self) == type(other) and self.schema == other.schema
+
+  def __hash__(self):
+return hash((type(self), self.schema.SerializePartialToString()))
+
+  def to_runner_api_parameter(self, unused_context):
+return (common_urns.coders.ROW.urn, self.schema, [])
+
+  @staticmethod
+  def from_type_hint(named_tuple_type, registry):
+return RowCoder(named_tuple_to_schema(named_tuple_type))
+
+
+def coder_from_type(type_):
+  type_info = type_.WhichOneof("type_info")
+  if type_info == "atomic_type":
+if type_.atomic_type in (schema_pb2.AtomicType.INT32,
+ schema_pb2.AtomicType.INT64):
+  return VarIntCoder()
+elif type_.atomic_type == schema_pb2.AtomicType.DOUBLE:
+  return FloatCoder()
+elif type_.atomic_type == schema_pb2.AtomicType.STRING:
+  return StrUtf8Coder()
+  elif type_info == "array_type":
+return IterableCoder(coder_from_type(type_.array_type.element_type))
+
+  # The Java SDK supports several more types, but the coders are not yet
+  # standard, and are not implemented in Python.
+  raise ValueError(
+  "Encountered a type that is not currently supported by RowCoder: %s" %
+  type_)
+
+
+# pylint: disable=unused-variable
+
+@Coder.register_urn(common_urns.coders.ROW.urn, schema_pb2.Schema)
+def from_runner_api_parameter(payload, components, unused_context):
+  return RowCoder(payload)
+
+
+class RowCoderImpl(StreamCoderImpl):
+  """For internal use only; no backwards-compatibility guarantees."""
+  SIZE_CODER = VarIntCoder().get_impl()
+  NULL_CODER = BytesCoder().get_impl()
+
+  def __init__(self, schema, components):
+self.schema = schema
+self.constructor = named_tuple_from_schema(schema)
+self.components = list(c.get_impl() for c in components)
+self.has_nullable_fields = any(
+field.type.nullable for field in self.schema.fields)
+
+  def encode_to_stream(self, value, out, nested):
+nvals = len(self.schema.fields)
+

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=298926=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-298926
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 21/Aug/19 19:23
Start Date: 21/Aug/19 19:23
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r315957349
 
 

 ##
 File path: sdks/python/apache_beam/typehints/schemas.py
 ##
 @@ -0,0 +1,217 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+""" Support for mapping python types to proto Schemas and back again.
+
+Python  Schema
+np.int8 <-> BYTE
+np.int16<-> INT16
+np.int32<-> INT32
+np.int64<-> INT64
+int ---/
+np.float32  <-> FLOAT
+np.float64  <-> DOUBLE
+float   ---/
+bool<-> BOOLEAN
+
+The mappings for STRING and BYTES are different between python 2 and python 3,
+because of the changes to str:
+py3:
+str/unicode <-> STRING
+bytes   <-> BYTES
+ByteString  ---/
+
+py2:
+unicode <-> STRING
+str/bytes   ---/
+ByteString  <-> BYTES
+"""
+
+from __future__ import absolute_import
+
+import sys
+from typing import ByteString
+from typing import Mapping
+from typing import NamedTuple
+from typing import Optional
+from typing import Sequence
+from uuid import uuid4
+
+import numpy as np
+from past.builtins import unicode
+
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.native_type_compatibility import _get_args
+from apache_beam.typehints.native_type_compatibility import 
_match_is_exactly_mapping
+from apache_beam.typehints.native_type_compatibility import 
_match_is_named_tuple
+from apache_beam.typehints.native_type_compatibility import _match_is_optional
+from apache_beam.typehints.native_type_compatibility import _safe_issubclass
+from apache_beam.typehints.native_type_compatibility import 
extract_optional_type
+
+
+# Registry of typings for a schema by UUID
+class SchemaTypeRegistry(object):
+  def __init__(self):
+self.by_id = {}
+self.by_typing = {}
+
+  def add(self, typing, schema):
+self.by_id[schema.id] = (typing, schema)
+
+  def get_typing_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[0] if result is not None else None
+
+  def get_schema_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[1] if result is not None else None
+
+
+SCHEMA_REGISTRY = SchemaTypeRegistry()
+
+
+# Bi-directional mappings
+_PRIMITIVES = (
+(np.int8, schema_pb2.AtomicType.BYTE),
+(np.int16, schema_pb2.AtomicType.INT16),
+(np.int32, schema_pb2.AtomicType.INT32),
+(np.int64, schema_pb2.AtomicType.INT64),
+(np.float32, schema_pb2.AtomicType.FLOAT),
+(np.float64, schema_pb2.AtomicType.DOUBLE),
+(unicode, schema_pb2.AtomicType.STRING),
+(bool, schema_pb2.AtomicType.BOOLEAN),
+(bytes if sys.version_info.major >= 3 else ByteString,
+ schema_pb2.AtomicType.BYTES),
+)
+
+PRIMITIVE_TO_ATOMIC_TYPE = dict((typ, atomic) for typ, atomic in _PRIMITIVES)
+ATOMIC_TYPE_TO_PRIMITIVE = dict((atomic, typ) for typ, atomic in _PRIMITIVES)
+
+# One-way mappings
+PRIMITIVE_TO_ATOMIC_TYPE.update({
+# In python 3, this is a no-op because str == unicode,
+# but in python 2 it overrides the bytes -> BYTES mapping.
+str: schema_pb2.AtomicType.STRING,
+# In python 2, this is a no-op because we define it as the bi-directional
+# mapping above. This just ensures the one-way mapping is defined in python
+# 3.
+ByteString: schema_pb2.AtomicType.BYTES,
+# Allow users to specify a native int, and use INT64 as the cross-language
+# representation. Technically ints have unlimited precision, but RowCoder
+# should throw an error if it sees one with a bit width > 64 when encoding.
+int: schema_pb2.AtomicType.INT64,
+float: schema_pb2.AtomicType.DOUBLE,
+})
+
+
+def typing_to_runner_api(type_):
+  if 

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=298934=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-298934
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 21/Aug/19 19:23
Start Date: 21/Aug/19 19:23
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316303919
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -115,8 +115,7 @@ def get_version():
 'mock>=1.0.1,<3.0.0',
 'pymongo>=3.8.0,<4.0.0',
 'oauth2client>=2.0.1,<4',
-# grpcio 1.8.1 and above requires protobuf 3.5.0.post1.
-'protobuf>=3.5.0.post1,<4',
+'protobuf>=3.8.0.post1,<4',
 
 Review comment:
   @aaltay do you think this might be an issue?
   
   @TheNeuralBit should be `3.8.0` not `3.8.0.post1`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 298934)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=298931=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-298931
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 21/Aug/19 19:23
Start Date: 21/Aug/19 19:23
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316359453
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder_test.py
 ##
 @@ -0,0 +1,129 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+from __future__ import absolute_import
+
+import logging
+import typing
+import unittest
+from itertools import chain
+
+import numpy as np
+from past.builtins import unicode
+
+from apache_beam.coders import RowCoder
+from apache_beam.coders.typecoders import registry as coders_registry
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import typing_to_runner_api
+
+Person = typing.NamedTuple("Person", [
+("name", unicode),
+("age", np.int32),
+("address", typing.Optional[unicode]),
+("aliases", typing.List[unicode]),
+])
+
+coders_registry.register_coder(Person, RowCoder)
 
 Review comment:
   Is this required for regular use? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 298931)
Time Spent: 5h  (was: 4h 50m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=298933=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-298933
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 21/Aug/19 19:23
Start Date: 21/Aug/19 19:23
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316315621
 
 

 ##
 File path: sdks/python/apache_beam/typehints/schemas.py
 ##
 @@ -0,0 +1,217 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+""" Support for mapping python types to proto Schemas and back again.
+
+Python  Schema
+np.int8 <-> BYTE
+np.int16<-> INT16
+np.int32<-> INT32
+np.int64<-> INT64
+int ---/
+np.float32  <-> FLOAT
+np.float64  <-> DOUBLE
+float   ---/
+bool<-> BOOLEAN
+
+The mappings for STRING and BYTES are different between python 2 and python 3,
+because of the changes to str:
+py3:
+str/unicode <-> STRING
+bytes   <-> BYTES
+ByteString  ---/
+
+py2:
+unicode <-> STRING
+str/bytes   ---/
+ByteString  <-> BYTES
+"""
+
+from __future__ import absolute_import
+
+import sys
+from typing import ByteString
+from typing import Mapping
+from typing import NamedTuple
+from typing import Optional
+from typing import Sequence
+from uuid import uuid4
+
+import numpy as np
+from past.builtins import unicode
+
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.native_type_compatibility import _get_args
+from apache_beam.typehints.native_type_compatibility import 
_match_is_exactly_mapping
+from apache_beam.typehints.native_type_compatibility import 
_match_is_named_tuple
+from apache_beam.typehints.native_type_compatibility import _match_is_optional
+from apache_beam.typehints.native_type_compatibility import _safe_issubclass
+from apache_beam.typehints.native_type_compatibility import 
extract_optional_type
+
+
+# Registry of typings for a schema by UUID
+class SchemaTypeRegistry(object):
+  def __init__(self):
+self.by_id = {}
+self.by_typing = {}
+
+  def add(self, typing, schema):
+self.by_id[schema.id] = (typing, schema)
+
+  def get_typing_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[0] if result is not None else None
+
+  def get_schema_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[1] if result is not None else None
+
+
+SCHEMA_REGISTRY = SchemaTypeRegistry()
+
+
+# Bi-directional mappings
+_PRIMITIVES = (
+(np.int8, schema_pb2.AtomicType.BYTE),
+(np.int16, schema_pb2.AtomicType.INT16),
+(np.int32, schema_pb2.AtomicType.INT32),
+(np.int64, schema_pb2.AtomicType.INT64),
+(np.float32, schema_pb2.AtomicType.FLOAT),
+(np.float64, schema_pb2.AtomicType.DOUBLE),
+(unicode, schema_pb2.AtomicType.STRING),
+(bool, schema_pb2.AtomicType.BOOLEAN),
+(bytes if sys.version_info.major >= 3 else ByteString,
+ schema_pb2.AtomicType.BYTES),
+)
+
+PRIMITIVE_TO_ATOMIC_TYPE = dict((typ, atomic) for typ, atomic in _PRIMITIVES)
+ATOMIC_TYPE_TO_PRIMITIVE = dict((atomic, typ) for typ, atomic in _PRIMITIVES)
+
+# One-way mappings
+PRIMITIVE_TO_ATOMIC_TYPE.update({
+# In python 3, this is a no-op because str == unicode,
+# but in python 2 it overrides the bytes -> BYTES mapping.
+str: schema_pb2.AtomicType.STRING,
 
 Review comment:
   Is there a test that does conversion to/from these types?
   I suspect that in Python 2 a str such as `'\xff'` will fail because it's not 
valid UTF-8.
   In other words, `str` should be invalid or converted to AtomicType.BYTES in 
Python 2.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time 

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=298929=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-298929
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 21/Aug/19 19:23
Start Date: 21/Aug/19 19:23
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316299259
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder.py
 ##
 @@ -0,0 +1,162 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import itertools
+from array import array
+
+from apache_beam.coders.coder_impl import StreamCoderImpl
+from apache_beam.coders.coders import BytesCoder
+from apache_beam.coders.coders import Coder
+from apache_beam.coders.coders import FastCoder
+from apache_beam.coders.coders import FloatCoder
+from apache_beam.coders.coders import IterableCoder
+from apache_beam.coders.coders import StrUtf8Coder
+from apache_beam.coders.coders import TupleCoder
+from apache_beam.coders.coders import VarIntCoder
+from apache_beam.portability import common_urns
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+
+__all__ = ["RowCoder"]
+
+
+class RowCoder(FastCoder):
+  """ Coder for `typing.NamedTuple` instances.
+
+  Implements the beam:coder:row:v1 standard coder spec.
+  """
+
+  def __init__(self, schema):
+self.schema = schema
+self.components = [
+coder_from_type(field.type) for field in self.schema.fields
+]
+
+  def _create_impl(self):
+return RowCoderImpl(self.schema, self.components)
+
+  def is_deterministic(self):
+return all(c.is_deterministic() for c in self.components)
+
+  def to_type_hint(self):
+return named_tuple_from_schema(self.schema)
+
+  def as_cloud_object(self, coders_context=None):
+raise NotImplementedError("TODO")
+
+  def __eq__(self, other):
+return type(self) == type(other) and self.schema == other.schema
+
+  def __hash__(self):
+return hash((type(self), self.schema.SerializePartialToString()))
+
+  def to_runner_api_parameter(self, unused_context):
+return (common_urns.coders.ROW.urn, self.schema, [])
+
+  @staticmethod
+  def from_type_hint(named_tuple_type, registry):
+return RowCoder(named_tuple_to_schema(named_tuple_type))
+
+
+def coder_from_type(type_):
+  type_info = type_.WhichOneof("type_info")
+  if type_info == "atomic_type":
+if type_.atomic_type in (schema_pb2.AtomicType.INT32,
+ schema_pb2.AtomicType.INT64):
+  return VarIntCoder()
+elif type_.atomic_type == schema_pb2.AtomicType.DOUBLE:
+  return FloatCoder()
+elif type_.atomic_type == schema_pb2.AtomicType.STRING:
+  return StrUtf8Coder()
+  elif type_info == "array_type":
+return IterableCoder(coder_from_type(type_.array_type.element_type))
+
+  # The Java SDK supports several more types, but the coders are not yet
+  # standard, and are not implemented in Python.
+  raise ValueError(
+  "Encountered a type that is not currently supported by RowCoder: %s" %
+  type_)
+
+
+# pylint: disable=unused-variable
+
+@Coder.register_urn(common_urns.coders.ROW.urn, schema_pb2.Schema)
+def from_runner_api_parameter(payload, components, unused_context):
 
 Review comment:
   This is typically implemented as a method of the coder (looking at other 
coders in Beam). The decorator will make it a staticmethod.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 298929)
Time Spent: 4h 50m  (was: 4h 40m)

> Make row coder a standard coder and 

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=298924=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-298924
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 21/Aug/19 19:23
Start Date: 21/Aug/19 19:23
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r315955065
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder.py
 ##
 @@ -0,0 +1,162 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import itertools
+from array import array
+
+from apache_beam.coders.coder_impl import StreamCoderImpl
+from apache_beam.coders.coders import BytesCoder
+from apache_beam.coders.coders import Coder
+from apache_beam.coders.coders import FastCoder
+from apache_beam.coders.coders import FloatCoder
+from apache_beam.coders.coders import IterableCoder
+from apache_beam.coders.coders import StrUtf8Coder
+from apache_beam.coders.coders import TupleCoder
+from apache_beam.coders.coders import VarIntCoder
+from apache_beam.portability import common_urns
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+
+__all__ = ["RowCoder"]
+
+
+class RowCoder(FastCoder):
+  """ Coder for `typing.NamedTuple` instances.
+
+  Implements the beam:coder:row:v1 standard coder spec.
+  """
+
+  def __init__(self, schema):
+self.schema = schema
 
 Review comment:
   Please document the type of argument schema.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 298924)
Time Spent: 4h 20m  (was: 4h 10m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=298930=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-298930
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 21/Aug/19 19:23
Start Date: 21/Aug/19 19:23
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316283745
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder.py
 ##
 @@ -0,0 +1,162 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import itertools
+from array import array
+
+from apache_beam.coders.coder_impl import StreamCoderImpl
+from apache_beam.coders.coders import BytesCoder
+from apache_beam.coders.coders import Coder
+from apache_beam.coders.coders import FastCoder
+from apache_beam.coders.coders import FloatCoder
+from apache_beam.coders.coders import IterableCoder
+from apache_beam.coders.coders import StrUtf8Coder
+from apache_beam.coders.coders import TupleCoder
+from apache_beam.coders.coders import VarIntCoder
+from apache_beam.portability import common_urns
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+
+__all__ = ["RowCoder"]
+
+
+class RowCoder(FastCoder):
+  """ Coder for `typing.NamedTuple` instances.
+
+  Implements the beam:coder:row:v1 standard coder spec.
+  """
+
+  def __init__(self, schema):
+self.schema = schema
+self.components = [
+coder_from_type(field.type) for field in self.schema.fields
+]
+
+  def _create_impl(self):
+return RowCoderImpl(self.schema, self.components)
+
+  def is_deterministic(self):
+return all(c.is_deterministic() for c in self.components)
+
+  def to_type_hint(self):
+return named_tuple_from_schema(self.schema)
+
+  def as_cloud_object(self, coders_context=None):
+raise NotImplementedError("TODO")
+
+  def __eq__(self, other):
+return type(self) == type(other) and self.schema == other.schema
+
+  def __hash__(self):
+return hash((type(self), self.schema.SerializePartialToString()))
+
+  def to_runner_api_parameter(self, unused_context):
+return (common_urns.coders.ROW.urn, self.schema, [])
+
+  @staticmethod
+  def from_type_hint(named_tuple_type, registry):
+return RowCoder(named_tuple_to_schema(named_tuple_type))
+
+
+def coder_from_type(type_):
+  type_info = type_.WhichOneof("type_info")
+  if type_info == "atomic_type":
+if type_.atomic_type in (schema_pb2.AtomicType.INT32,
+ schema_pb2.AtomicType.INT64):
+  return VarIntCoder()
+elif type_.atomic_type == schema_pb2.AtomicType.DOUBLE:
+  return FloatCoder()
+elif type_.atomic_type == schema_pb2.AtomicType.STRING:
+  return StrUtf8Coder()
+  elif type_info == "array_type":
+return IterableCoder(coder_from_type(type_.array_type.element_type))
+
+  # The Java SDK supports several more types, but the coders are not yet
+  # standard, and are not implemented in Python.
+  raise ValueError(
+  "Encountered a type that is not currently supported by RowCoder: %s" %
+  type_)
+
+
+# pylint: disable=unused-variable
+
+@Coder.register_urn(common_urns.coders.ROW.urn, schema_pb2.Schema)
+def from_runner_api_parameter(payload, components, unused_context):
+  return RowCoder(payload)
+
+
+class RowCoderImpl(StreamCoderImpl):
+  """For internal use only; no backwards-compatibility guarantees."""
+  SIZE_CODER = VarIntCoder().get_impl()
+  NULL_CODER = BytesCoder().get_impl()
+
+  def __init__(self, schema, components):
+self.schema = schema
+self.constructor = named_tuple_from_schema(schema)
+self.components = list(c.get_impl() for c in components)
+self.has_nullable_fields = any(
+field.type.nullable for field in self.schema.fields)
+
+  def encode_to_stream(self, value, out, nested):
+nvals = len(self.schema.fields)
+

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=298925=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-298925
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 21/Aug/19 19:23
Start Date: 21/Aug/19 19:23
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r315956283
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder.py
 ##
 @@ -0,0 +1,162 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import itertools
+from array import array
+
+from apache_beam.coders.coder_impl import StreamCoderImpl
+from apache_beam.coders.coders import BytesCoder
+from apache_beam.coders.coders import Coder
+from apache_beam.coders.coders import FastCoder
+from apache_beam.coders.coders import FloatCoder
+from apache_beam.coders.coders import IterableCoder
+from apache_beam.coders.coders import StrUtf8Coder
+from apache_beam.coders.coders import TupleCoder
+from apache_beam.coders.coders import VarIntCoder
+from apache_beam.portability import common_urns
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+
+__all__ = ["RowCoder"]
+
+
+class RowCoder(FastCoder):
+  """ Coder for `typing.NamedTuple` instances.
+
+  Implements the beam:coder:row:v1 standard coder spec.
+  """
+
+  def __init__(self, schema):
+self.schema = schema
+self.components = [
+coder_from_type(field.type) for field in self.schema.fields
+]
+
+  def _create_impl(self):
+return RowCoderImpl(self.schema, self.components)
+
+  def is_deterministic(self):
+return all(c.is_deterministic() for c in self.components)
+
+  def to_type_hint(self):
+return named_tuple_from_schema(self.schema)
+
+  def as_cloud_object(self, coders_context=None):
+raise NotImplementedError("TODO")
+
+  def __eq__(self, other):
+return type(self) == type(other) and self.schema == other.schema
+
+  def __hash__(self):
 
 Review comment:
   AFAICT, RowCoder doesn't need to be hashable, so this can be set to None 
instead of having an implementation. The reason is that self.schema is not 
hashable.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 298925)
Time Spent: 4h 20m  (was: 4h 10m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=298927=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-298927
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 21/Aug/19 19:23
Start Date: 21/Aug/19 19:23
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316283393
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder.py
 ##
 @@ -0,0 +1,162 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import itertools
+from array import array
+
+from apache_beam.coders.coder_impl import StreamCoderImpl
+from apache_beam.coders.coders import BytesCoder
+from apache_beam.coders.coders import Coder
+from apache_beam.coders.coders import FastCoder
+from apache_beam.coders.coders import FloatCoder
+from apache_beam.coders.coders import IterableCoder
+from apache_beam.coders.coders import StrUtf8Coder
+from apache_beam.coders.coders import TupleCoder
+from apache_beam.coders.coders import VarIntCoder
+from apache_beam.portability import common_urns
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import named_tuple_from_schema
+from apache_beam.typehints.schemas import named_tuple_to_schema
+
+__all__ = ["RowCoder"]
+
+
+class RowCoder(FastCoder):
+  """ Coder for `typing.NamedTuple` instances.
+
+  Implements the beam:coder:row:v1 standard coder spec.
+  """
+
+  def __init__(self, schema):
+self.schema = schema
+self.components = [
+coder_from_type(field.type) for field in self.schema.fields
+]
+
+  def _create_impl(self):
+return RowCoderImpl(self.schema, self.components)
+
+  def is_deterministic(self):
+return all(c.is_deterministic() for c in self.components)
+
+  def to_type_hint(self):
+return named_tuple_from_schema(self.schema)
+
+  def as_cloud_object(self, coders_context=None):
+raise NotImplementedError("TODO")
+
+  def __eq__(self, other):
+return type(self) == type(other) and self.schema == other.schema
+
+  def __hash__(self):
+return hash((type(self), self.schema.SerializePartialToString()))
+
+  def to_runner_api_parameter(self, unused_context):
+return (common_urns.coders.ROW.urn, self.schema, [])
+
+  @staticmethod
+  def from_type_hint(named_tuple_type, registry):
+return RowCoder(named_tuple_to_schema(named_tuple_type))
+
+
+def coder_from_type(type_):
+  type_info = type_.WhichOneof("type_info")
+  if type_info == "atomic_type":
+if type_.atomic_type in (schema_pb2.AtomicType.INT32,
+ schema_pb2.AtomicType.INT64):
+  return VarIntCoder()
+elif type_.atomic_type == schema_pb2.AtomicType.DOUBLE:
+  return FloatCoder()
+elif type_.atomic_type == schema_pb2.AtomicType.STRING:
+  return StrUtf8Coder()
+  elif type_info == "array_type":
+return IterableCoder(coder_from_type(type_.array_type.element_type))
+
+  # The Java SDK supports several more types, but the coders are not yet
+  # standard, and are not implemented in Python.
+  raise ValueError(
+  "Encountered a type that is not currently supported by RowCoder: %s" %
+  type_)
+
+
+# pylint: disable=unused-variable
 
 Review comment:
   Please try to limit the scope of these disable statements to a block or a 
single line instead of half the file.
   You can alternatively prefix a variable name with `unused_`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 298927)
Time Spent: 4h 40m  (was: 4.5h)

> Make row coder a standard coder and implement in python
> ---
>
> 

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=298841=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-298841
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 21/Aug/19 16:38
Start Date: 21/Aug/19 16:38
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-523542557
 
 
   Thank you both! Let me know if you have any questions
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 298841)
Time Spent: 4h 10m  (was: 4h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=298825=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-298825
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 21/Aug/19 16:21
Start Date: 21/Aug/19 16:21
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9188: [BEAM-7886] Make row 
coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-523534183
 
 
   I'm reviewing the Python changes
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 298825)
Time Spent: 4h  (was: 3h 50m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=296678=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296678
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 17/Aug/19 00:15
Start Date: 17/Aug/19 00:15
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r314924560
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -642,6 +642,45 @@ message StandardCoders {
 // Components: Coder for a single element.
 // Experimental.
 STATE_BACKED_ITERABLE = 9 [(beam_urn) = 
"beam:coder:state_backed_iterable:v1"];
+
+// Additional Standard Coders
+// --
+// The following coders are not required to be implemented for an SDK or
+// runner to support the Beam model, but can enable additional
+// functionality.
+
+// Encodes a "row", an element with a known schema, defined by an
+// instance of Schema from schema.proto.
+//
+// A row is encoded as the concatenation of:
+//   - The number of attributes in the schema, encoded with
+// beam:coder:varint:v1. This is useful for detecting supported schema
+// changes (column additions/deletions).
 
 Review comment:
   Is the new explanation in the [latest 
commit](https://github.com/apache/beam/pull/9188/commits/25dcc50a8de9c607069a8efc80a6da67a6e8b0ca)
 better?
   
   My intention was to indicate that "column additions/deletions" are the 
supported schema changes
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296678)
Time Spent: 3h 40m  (was: 3.5h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=296648=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296648
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 16/Aug/19 23:18
Start Date: 16/Aug/19 23:18
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r314917601
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -642,6 +642,45 @@ message StandardCoders {
 // Components: Coder for a single element.
 // Experimental.
 STATE_BACKED_ITERABLE = 9 [(beam_urn) = 
"beam:coder:state_backed_iterable:v1"];
+
+// Additional Standard Coders
+// --
+// The following coders are not required to be implemented for an SDK or
+// runner to support the Beam model, but can enable additional
+// functionality.
+
+// Encodes a "row", an element with a known schema, defined by an
 
 Review comment:
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296648)
Time Spent: 3.5h  (was: 3h 20m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=296645=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296645
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 16/Aug/19 23:17
Start Date: 16/Aug/19 23:17
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r314917511
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -642,6 +642,45 @@ message StandardCoders {
 // Components: Coder for a single element.
 // Experimental.
 STATE_BACKED_ITERABLE = 9 [(beam_urn) = 
"beam:coder:state_backed_iterable:v1"];
+
+// Additional Standard Coders
+// --
+// The following coders are not required to be implemented for an SDK or
+// runner to support the Beam model, but can enable additional
+// functionality.
+
+// Encodes a "row", an element with a known schema, defined by an
+// instance of Schema from schema.proto.
+//
+// A row is encoded as the concatenation of:
+//   - The number of attributes in the schema, encoded with
+// beam:coder:varint:v1. This is useful for detecting supported schema
+// changes (column additions/deletions).
+//   - A packed bitset indicating null fields (a 1 indicating a null)
 
 Review comment:
   Good suggestion, done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296645)
Time Spent: 3h 10m  (was: 3h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=296646=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296646
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 16/Aug/19 23:17
Start Date: 16/Aug/19 23:17
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r314917564
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -642,6 +642,45 @@ message StandardCoders {
 // Components: Coder for a single element.
 // Experimental.
 STATE_BACKED_ITERABLE = 9 [(beam_urn) = 
"beam:coder:state_backed_iterable:v1"];
+
+// Additional Standard Coders
+// --
+// The following coders are not required to be implemented for an SDK or
+// runner to support the Beam model, but can enable additional
+// functionality.
+
+// Encodes a "row", an element with a known schema, defined by an
+// instance of Schema from schema.proto.
+//
+// A row is encoded as the concatenation of:
+//   - The number of attributes in the schema, encoded with
+// beam:coder:varint:v1. This is useful for detecting supported schema
+// changes (column additions/deletions).
+//   - A packed bitset indicating null fields (a 1 indicating a null)
+// encoded with beam:coder:bytes:v1. If there are no nulls an empty 
byte
+// array is encoded.
+//   - An encoding for each non-null field, concatenated together.
+//
+// Schema types are mapped to coders as follows:
+//   AtomicType:
+// BYTE:  not yet a standard coder
 
 Review comment:
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 296646)
Time Spent: 3h 20m  (was: 3h 10m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=294313=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294313
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 14/Aug/19 01:11
Start Date: 14/Aug/19 01:11
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r313671666
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CoderTranslationTest.java
 ##
 @@ -74,6 +78,14 @@
   FullWindowedValueCoder.of(
   IterableCoder.of(VarLongCoder.of()), 
IntervalWindowCoder.of()))
   .add(DoubleCoder.of())
+  .add(
 
 Review comment:
   Oh wow good catch! I've actually changed the coder translation now to create 
instances of `PortableSchemaCoder`, which shouldn't have that problem. I also 
reverted all the changes to RowCoder, so it's back to being a `CustomCoder`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294313)
Time Spent: 2h 50m  (was: 2h 40m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=294314=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294314
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 14/Aug/19 01:11
Start Date: 14/Aug/19 01:11
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r313671690
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CoderTranslators.java
 ##
 @@ -118,6 +122,32 @@ public T fromComponents(List> components) {
 };
   }
 
+  static CoderTranslator row() {
+return new CoderTranslator() {
+  @Override
+  public List> getComponents(RowCoder from) {
+return ImmutableList.of();
+  }
+
+  @Override
+  public byte[] getPayload(RowCoder from) {
+return SchemaTranslation.toProto(from.getSchema()).toByteArray();
+  }
+
+  @Override
+  public RowCoder fromComponents(List> components, byte[] 
payload) {
+// Assert that components are empty?
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294314)
Time Spent: 3h  (was: 2h 50m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=294307=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294307
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 14/Aug/19 00:42
Start Date: 14/Aug/19 00:42
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-521060327
 
 
   R: @reuvenlax would you mind taking a look at the model changes and the Java 
changes? I can separate out the relevant commits in their own PR if that helps.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294307)
Time Spent: 2h 40m  (was: 2.5h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=292374=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-292374
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 10/Aug/19 00:39
Start Date: 10/Aug/19 00:39
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r312682478
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder_test.py
 ##
 @@ -0,0 +1,126 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+from __future__ import absolute_import
+
+import logging
+import typing
+import unittest
+
+import numpy as np
+from itertools import chain
+from past.builtins import unicode
+
+from apache_beam.coders import RowCoder
+from apache_beam.coders.typecoders import registry as coders_registry
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import typing_to_runner_api
+
+Person = typing.NamedTuple("Person", [
+("name", unicode),
+("age", np.int32),
+("address", typing.Optional[unicode]),
+("aliases", typing.List[unicode]),
+])
+
+coders_registry.register_coder(Person, RowCoder)
+
+
+class CodersTest(unittest.TestCase):
+  TEST_CASES = [
+  Person("Jon Snow", 23, None, ["crow", "wildling"]),
+  Person("Daenerys Targaryen", 25, "Westeros", ["Mother of Dragons"]),
+  Person("Michael Bluth", 30, None, [])
+  ]
+
+  def test_create_row_coder_from_named_tuple(self):
+expected_coder = RowCoder(typing_to_runner_api(Person).row_type.schema)
+real_coder = coders_registry.get_coder(Person)
+
+for test_case in self.TEST_CASES:
+  self.assertEqual(
+  expected_coder.encode(test_case), real_coder.encode(test_case))
+
+  self.assertEqual(test_case,
+   real_coder.decode(real_coder.encode(test_case)))
+
+  def test_create_row_coder_from_schema(self):
+schema = schema_pb2.Schema(
+id="person",
+fields=[
+schema_pb2.Field(
+name="name",
+type=schema_pb2.FieldType(
+atomic_type=schema_pb2.AtomicType.STRING)),
+schema_pb2.Field(
+name="age",
+type=schema_pb2.FieldType(
+atomic_type=schema_pb2.AtomicType.INT32)),
+schema_pb2.Field(
+name="address",
+type=schema_pb2.FieldType(
+atomic_type=schema_pb2.AtomicType.STRING, nullable=True)),
+schema_pb2.Field(
+name="aliases",
+type=schema_pb2.FieldType(
+array_type=schema_pb2.ArrayType(
+element_type=schema_pb2.FieldType(
+atomic_type=schema_pb2.AtomicType.STRING,
+])
+coder = RowCoder(schema)
+
+for test_case in self.TEST_CASES:
+  self.assertEqual(test_case, coder.decode(coder.encode(test_case)))
+
+  @unittest.skip("Need to decide whether to defer to the stream writer for 
these checks or add explicit checks")
 
 Review comment:
   I've currently skipped this test because I'm unsure of the best place to 
throw the OverflowError. Currently the fast version of 
`OutputStream.write_var_int_64` (and thus `VarIntCoder.encode`) throws 
OverflowError for ints larger than 64 bits, but the slow version does not.
   
   Would it make sense to add a range check to the slow version to make it 
match the fast behavior? If I do that I could also just add a write_var_int32 
that works the same way.
   
   The other option would be to just add explicit checks somewhere in 
row_coder.py
   
   cc @robertwb 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=292372=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-292372
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 10/Aug/19 00:33
Start Date: 10/Aug/19 00:33
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r312682110
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CommonCoderTest.java
 ##
 @@ -278,41 +290,85 @@ private static Object convertValue(Object value, 
CommonCoder coderSpec, Coder co
   return WindowedValue.of(windowValue, timestamp, windows, paneInfo);
 } else if (s.equals(getUrn(StandardCoders.Enum.DOUBLE))) {
   return Double.parseDouble((String) value);
+} else if (s.equals(getUrn(StandardCoders.Enum.ROW))) {
+  Schema schema;
+  try {
+schema = 
SchemaTranslation.fromProto(SchemaApi.Schema.parseFrom(coderSpec.getPayload()));
+  } catch (InvalidProtocolBufferException e) {
+throw new RuntimeException("Failed to parse schema payload for row 
coder", e);
+  }
+
+  return parseField(value, Schema.FieldType.row(schema));
 } else {
   throw new IllegalStateException("Unknown coder URN: " + 
coderSpec.getUrn());
 }
   }
 
+  private static Object parseField(Object value, Schema.FieldType fieldType) {
+switch (fieldType.getTypeName()) {
+  case BYTE:
+return ((Number) value).byteValue();
+  case INT16:
+return ((Number) value).shortValue();
+  case INT32:
+return ((Number) value).intValue();
+  case INT64:
+return ((Number) value).longValue();
+  case FLOAT:
+return Float.parseFloat((String) value);
+  case DOUBLE:
+return Double.parseDouble((String) value);
+  case STRING:
+return (String) value;
+  case BOOLEAN:
+return (Boolean) value;
+  case BYTES:
+// extract String as byte[]
+return ((String) value).getBytes(StandardCharsets.ISO_8859_1);
+  case ARRAY:
+return ((List) value)
+.stream()
+.map((element) -> parseField(element, 
fieldType.getCollectionElementType()))
+.collect(toImmutableList());
+  case MAP:
+Map kvMap = (Map) value;
+return kvMap.entrySet().stream()
+.collect(
+toImmutableMap(
+(pair) -> parseField(pair.getKey(), 
fieldType.getMapKeyType()),
+(pair) -> parseField(pair.getValue(), 
fieldType.getMapValueType(;
+  case ROW:
+Map rowMap = (Map) value;
+Schema schema = fieldType.getRowSchema();
+Row.Builder row = Row.withSchema(schema);
+for (Schema.Field field : schema.getFields()) {
+  Object element = rowMap.remove(field.getName());
+  if (element != null) {
+element = parseField(element, field.getType());
+  }
+  row.addValue(element);
+}
+
+if (!rowMap.isEmpty()) {
+  throw new IllegalArgumentException(
+  "Value contains keys that are not in the schema: " + 
rowMap.keySet());
+}
+
+return row.build();
+  default: // DECIMAL, DATETIME, LOGICAL_TYPE
+throw new IllegalArgumentException("Unsupported type name: " + 
fieldType.getTypeName());
+}
+  }
+
   private static Coder instantiateCoder(CommonCoder coder) {
 List> components = new ArrayList<>();
 for (CommonCoder innerCoder : coder.getComponents()) {
   components.add(instantiateCoder(innerCoder));
 }
-String s = coder.getUrn();
-if (s.equals(getUrn(StandardCoders.Enum.BYTES))) {
-  return ByteArrayCoder.of();
-} else if (s.equals(getUrn(StandardCoders.Enum.STRING_UTF8))) {
-  return StringUtf8Coder.of();
-} else if (s.equals(getUrn(StandardCoders.Enum.KV))) {
-  return KvCoder.of(components.get(0), components.get(1));
-} else if (s.equals(getUrn(StandardCoders.Enum.VARINT))) {
-  return VarLongCoder.of();
-} else if (s.equals(getUrn(StandardCoders.Enum.INTERVAL_WINDOW))) {
-  return IntervalWindowCoder.of();
-} else if (s.equals(getUrn(StandardCoders.Enum.ITERABLE))) {
-  return IterableCoder.of(components.get(0));
-} else if (s.equals(getUrn(StandardCoders.Enum.TIMER))) {
-  return Timer.Coder.of(components.get(0));
-} else if (s.equals(getUrn(StandardCoders.Enum.GLOBAL_WINDOW))) {
-  return GlobalWindow.Coder.INSTANCE;
-} else if (s.equals(getUrn(StandardCoders.Enum.WINDOWED_VALUE))) {
-  return WindowedValue.FullWindowedValueCoder.of(
-  components.get(0), (Coder) components.get(1));
-} else if 

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=292370=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-292370
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 10/Aug/19 00:31
Start Date: 10/Aug/19 00:31
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r312682034
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder_test.py
 ##
 @@ -0,0 +1,89 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+from __future__ import absolute_import
+
+import logging
+import typing
+import unittest
+
+import numpy as np
+
+from apache_beam.coders import RowCoder
+from apache_beam.coders.typecoders import registry as coders_registry
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import typing_to_runner_api
+
+Person = typing.NamedTuple("Person", [
+("name", np.unicode),
+("age", np.int32),
+("address", typing.Optional[np.unicode]),
+("aliases", typing.List[np.unicode]),
 
 Review comment:
   Good point, I've updated all the string/bytes mappings.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 292370)
Time Spent: 2h  (was: 1h 50m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=292369=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-292369
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 10/Aug/19 00:31
Start Date: 10/Aug/19 00:31
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r312681991
 
 

 ##
 File path: sdks/python/apache_beam/typehints/schemas.py
 ##
 @@ -0,0 +1,163 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+from typing import List
+from typing import Mapping
+from typing import NamedTuple
+from typing import Optional
+from uuid import uuid4
+
+import numpy as np
+
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.native_type_compatibility import _get_args
+from apache_beam.typehints.native_type_compatibility import 
_match_is_exactly_mapping
+from apache_beam.typehints.native_type_compatibility import 
_match_is_named_tuple
+from apache_beam.typehints.native_type_compatibility import _match_is_optional
+from apache_beam.typehints.native_type_compatibility import _safe_issubclass
+from apache_beam.typehints.native_type_compatibility import 
extract_optional_type
+
+
+# Registry of typings for a schema by UUID
+class SchemaTypeRegistry(object):
+  def __init__(self):
+self.by_id = {}
+self.by_typing = {}
+
+  def add(self, typing, schema):
+self.by_id[schema.id] = (typing, schema)
+
+  def get_typing_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[0] if result is not None else None
+
+  def get_schema_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[1] if result is not None else None
+
+
+SCHEMA_REGISTRY = SchemaTypeRegistry()
+
+
+_PRIMITIVES = (
+(np.int8, schema_pb2.AtomicType.BYTE),
+(np.int16, schema_pb2.AtomicType.INT16),
+(np.int32, schema_pb2.AtomicType.INT32),
+(np.int64, schema_pb2.AtomicType.INT64),
+(np.float32, schema_pb2.AtomicType.FLOAT),
+(np.float64, schema_pb2.AtomicType.DOUBLE),
+(np.unicode, schema_pb2.AtomicType.STRING),
+(np.bool, schema_pb2.AtomicType.BOOLEAN),
+(np.bytes_, schema_pb2.AtomicType.BYTES),
 
 Review comment:
   This works in python 3, but in python 2 it would make a str map to BYTES. 
I've updated this mapping as @robertwb suggested on the mailing list:
   - In python 2 str/bytes maps to STRING, and types.ByteString is the only way 
to specify BYTES.
   - In python 3, str/unicode maps to STRING, and bytes and types.ByteString 
both map to BYTES.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 292369)
Time Spent: 1h 50m  (was: 1h 40m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=292371=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-292371
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 10/Aug/19 00:31
Start Date: 10/Aug/19 00:31
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r312682036
 
 

 ##
 File path: sdks/python/apache_beam/typehints/schemas.py
 ##
 @@ -0,0 +1,163 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+from typing import List
+from typing import Mapping
+from typing import NamedTuple
+from typing import Optional
+from uuid import uuid4
+
+import numpy as np
+
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.native_type_compatibility import _get_args
+from apache_beam.typehints.native_type_compatibility import 
_match_is_exactly_mapping
+from apache_beam.typehints.native_type_compatibility import 
_match_is_named_tuple
+from apache_beam.typehints.native_type_compatibility import _match_is_optional
+from apache_beam.typehints.native_type_compatibility import _safe_issubclass
+from apache_beam.typehints.native_type_compatibility import 
extract_optional_type
+
+
+# Registry of typings for a schema by UUID
+class SchemaTypeRegistry(object):
+  def __init__(self):
+self.by_id = {}
+self.by_typing = {}
+
+  def add(self, typing, schema):
+self.by_id[schema.id] = (typing, schema)
+
+  def get_typing_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[0] if result is not None else None
+
+  def get_schema_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[1] if result is not None else None
+
+
+SCHEMA_REGISTRY = SchemaTypeRegistry()
+
+
+_PRIMITIVES = (
+(np.int8, schema_pb2.AtomicType.BYTE),
+(np.int16, schema_pb2.AtomicType.INT16),
+(np.int32, schema_pb2.AtomicType.INT32),
+(np.int64, schema_pb2.AtomicType.INT64),
+(np.float32, schema_pb2.AtomicType.FLOAT),
+(np.float64, schema_pb2.AtomicType.DOUBLE),
+(np.unicode, schema_pb2.AtomicType.STRING),
 
 Review comment:
   Good point, I've updated all the string/bytes mappings.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 292371)
Time Spent: 2h 10m  (was: 2h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=292366=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-292366
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 10/Aug/19 00:23
Start Date: 10/Aug/19 00:23
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r312681424
 
 

 ##
 File path: sdks/python/apache_beam/typehints/schemas.py
 ##
 @@ -0,0 +1,163 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+from typing import List
+from typing import Mapping
+from typing import NamedTuple
+from typing import Optional
+from uuid import uuid4
+
+import numpy as np
+
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.native_type_compatibility import _get_args
+from apache_beam.typehints.native_type_compatibility import 
_match_is_exactly_mapping
+from apache_beam.typehints.native_type_compatibility import 
_match_is_named_tuple
+from apache_beam.typehints.native_type_compatibility import _match_is_optional
+from apache_beam.typehints.native_type_compatibility import _safe_issubclass
+from apache_beam.typehints.native_type_compatibility import 
extract_optional_type
+
+
+# Registry of typings for a schema by UUID
+class SchemaTypeRegistry(object):
+  def __init__(self):
+self.by_id = {}
+self.by_typing = {}
+
+  def add(self, typing, schema):
+self.by_id[schema.id] = (typing, schema)
+
+  def get_typing_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[0] if result is not None else None
+
+  def get_schema_by_id(self, unique_id):
+result = self.by_id.get(unique_id, None)
+return result[1] if result is not None else None
+
+
+SCHEMA_REGISTRY = SchemaTypeRegistry()
+
+
+_PRIMITIVES = (
+(np.int8, schema_pb2.AtomicType.BYTE),
+(np.int16, schema_pb2.AtomicType.INT16),
+(np.int32, schema_pb2.AtomicType.INT32),
+(np.int64, schema_pb2.AtomicType.INT64),
+(np.float32, schema_pb2.AtomicType.FLOAT),
+(np.float64, schema_pb2.AtomicType.DOUBLE),
+(np.unicode, schema_pb2.AtomicType.STRING),
+(np.bool, schema_pb2.AtomicType.BOOLEAN),
 
 Review comment:
   Fair point. I've changed it to bool.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 292366)
Time Spent: 1h 40m  (was: 1.5h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=290895=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290895
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 08/Aug/19 00:09
Start Date: 08/Aug/19 00:09
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r311179927
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -642,6 +642,45 @@ message StandardCoders {
 // Components: Coder for a single element.
 // Experimental.
 STATE_BACKED_ITERABLE = 9 [(beam_urn) = 
"beam:coder:state_backed_iterable:v1"];
+
+// Additional Standard Coders
+// --
+// The following coders are not required to be implemented for an SDK or
+// runner to support the Beam model, but can enable additional
+// functionality.
+
+// Encodes a "row", an element with a known schema, defined by an
+// instance of Schema from schema.proto.
+//
+// A row is encoded as the concatenation of:
+//   - The number of attributes in the schema, encoded with
+// beam:coder:varint:v1. This is useful for detecting supported schema
+// changes (column additions/deletions).
+//   - A packed bitset indicating null fields (a 1 indicating a null)
 
 Review comment:
   Can we be a bit more specific here? Something like "A byte array 
representing a packed bitset ...". And mention that the "padding bits" are 0's?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290895)
Time Spent: 1.5h  (was: 1h 20m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=290894=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290894
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 08/Aug/19 00:09
Start Date: 08/Aug/19 00:09
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r311292446
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CoderTranslationTest.java
 ##
 @@ -74,6 +78,14 @@
   FullWindowedValueCoder.of(
   IterableCoder.of(VarLongCoder.of()), 
IntervalWindowCoder.of()))
   .add(DoubleCoder.of())
+  .add(
 
 Review comment:
   It seems that the tests here won't be able to compare `RowCoder`'s equality 
properly. The root of the problem is not in this file though. It is because 
`RowCoder` currently does not define a correct `equals()` method (any two 
`RowCoder` instances are considered equal now).
   
   Details: `RowCoder` currently extends `StructuredCoder`'s `equals()` 
definition, which checks if the `component`s in the coder are the same, but the 
schema associated with a `RowCoder` is not its `component`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290894)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=290890=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290890
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 08/Aug/19 00:09
Start Date: 08/Aug/19 00:09
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r311179276
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -642,6 +642,45 @@ message StandardCoders {
 // Components: Coder for a single element.
 // Experimental.
 STATE_BACKED_ITERABLE = 9 [(beam_urn) = 
"beam:coder:state_backed_iterable:v1"];
+
+// Additional Standard Coders
+// --
+// The following coders are not required to be implemented for an SDK or
+// runner to support the Beam model, but can enable additional
+// functionality.
+
+// Encodes a "row", an element with a known schema, defined by an
+// instance of Schema from schema.proto.
+//
+// A row is encoded as the concatenation of:
+//   - The number of attributes in the schema, encoded with
+// beam:coder:varint:v1. This is useful for detecting supported schema
+// changes (column additions/deletions).
 
 Review comment:
   Can we explains in more detail what "supported schema changes" means here? 
Is it only useful for the streaming pipeline update case?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290890)
Time Spent: 1h  (was: 50m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


  1   2   >