[jira] [Work logged] (BEAM-5720) Default coder breaks with large ints on Python 3

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5720?focusedWorklogId=153980=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153980
 ]

ASF GitHub Bot logged work on BEAM-5720:


Author: ASF GitHub Bot
Created on: 12/Oct/18 18:29
Start Date: 12/Oct/18 18:29
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6659: 
[BEAM-5720] Fix encoding of large python ints in Python 3.
URL: https://github.com/apache/beam/pull/6659#discussion_r224876803
 
 

 ##
 File path: sdks/python/apache_beam/coders/coder_impl.py
 ##
 @@ -293,8 +299,21 @@ def encode_to_stream(self, value, stream, nested):
 if value is None:
   stream.write_byte(NONE_TYPE)
 elif t is int:
-  stream.write_byte(INT_TYPE)
-  stream.write_var_int64(value)
+  # In Python 3, an int may be larger than 64 bits.
+  # Note that an OverflowError on stream.write_var_int64 would happen
+  # *after* the marker byte is written, so we must check earlier.
+  try:
+# This may throw an overflow error when compiled.
+int_value = value
+# Otherwise, we must check ourselves.
+if not is_compiled:
+  if not fits_in_64_bits(value):
 
 Review comment:
   Thanks. Consider the following wording for comments: 
   

   
   ```  
   
 # In Python 3, an int may be larger than 64 bits.  
   
 # We need to check whether value fits into a 64 bit integer before 
   
 # writing the marker byte. 
   
 try:   
   
   # In Cython-compiled code this will throw an overflow error  
   
   # when value does not fit into int64.
   
   int_value = value
   
   # If Cython is not used, we must do a (slower) check ourselves.  
   
   if not is_compiled:  
   
  ...   
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153980)
Time Spent: 1h 10m  (was: 1h)

> Default coder breaks with large ints on Python 3
> 
>
> Key: BEAM-5720
> URL: https://issues.apache.org/jira/browse/BEAM-5720
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The test for `int` includes greater than 64-bit values, which causes an 
> overflow error later in the code. We need to only use that coding scheme for 
> machine-sized ints. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5720) Default coder breaks with large ints on Python 3

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5720?focusedWorklogId=153979=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153979
 ]

ASF GitHub Bot logged work on BEAM-5720:


Author: ASF GitHub Bot
Created on: 12/Oct/18 18:29
Start Date: 12/Oct/18 18:29
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6659: 
[BEAM-5720] Fix encoding of large python ints in Python 3.
URL: https://github.com/apache/beam/pull/6659#discussion_r224870551
 
 

 ##
 File path: sdks/python/apache_beam/coders/coder_impl.pxd
 ##
 @@ -69,8 +69,9 @@ cdef class DeterministicFastPrimitivesCoderImpl(CoderImpl):
 
 
 cdef object NoneType
-cdef char UNKNOWN_TYPE, NONE_TYPE, INT_TYPE, FLOAT_TYPE, BOOL_TYPE
-cdef char BYTES_TYPE, UNICODE_TYPE, LIST_TYPE, TUPLE_TYPE, DICT_TYPE, SET_TYPE
+cdef unsigned char UNKNOWN_TYPE, NONE_TYPE, INT_TYPE, FLOAT_TYPE, BOOL_TYPE
 
 Review comment:
   Curious, what was a motivation for this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153979)
Time Spent: 1h  (was: 50m)

> Default coder breaks with large ints on Python 3
> 
>
> Key: BEAM-5720
> URL: https://issues.apache.org/jira/browse/BEAM-5720
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The test for `int` includes greater than 64-bit values, which causes an 
> overflow error later in the code. We need to only use that coding scheme for 
> machine-sized ints. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5720) Default coder breaks with large ints on Python 3

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5720?focusedWorklogId=153847=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153847
 ]

ASF GitHub Bot logged work on BEAM-5720:


Author: ASF GitHub Bot
Created on: 12/Oct/18 09:40
Start Date: 12/Oct/18 09:40
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #6659: 
[BEAM-5720] Fix encoding of large python ints in Python 3.
URL: https://github.com/apache/beam/pull/6659#discussion_r224728870
 
 

 ##
 File path: sdks/python/apache_beam/coders/coder_impl.py
 ##
 @@ -293,8 +299,21 @@ def encode_to_stream(self, value, stream, nested):
 if value is None:
   stream.write_byte(NONE_TYPE)
 elif t is int:
-  stream.write_byte(INT_TYPE)
-  stream.write_var_int64(value)
+  # In Python 3, an int may be larger than 64 bits.
+  # Note that an OverflowError on stream.write_var_int64 would happen
+  # *after* the marker byte is written, so we must check earlier.
+  try:
+# This may throw an overflow error when compiled.
+int_value = value
+# Otherwise, we must check ourselves.
+if not is_compiled:
+  if not fits_in_64_bits(value):
 
 Review comment:
   Yep. Current code:
   ```
   small_int, FastPrimitivesCoder, 1000 element(s): per element median time 
cost: 1.1301e-07 sec, relative std: 19.00%
   ```
   Removing the is_compiled check
   ```
   small_int, FastPrimitivesCoder, 1000 element(s): per element median time 
cost: 1.88589e-07 sec, relative std: 18.08%
   ```
   
   That's over a 60% increase. I tried inlining it and using constants rather 
than computing the bounds each time, which helps some but the check is entirely 
redundant on compiled code. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153847)
Time Spent: 50m  (was: 40m)

> Default coder breaks with large ints on Python 3
> 
>
> Key: BEAM-5720
> URL: https://issues.apache.org/jira/browse/BEAM-5720
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The test for `int` includes greater than 64-bit values, which causes an 
> overflow error later in the code. We need to only use that coding scheme for 
> machine-sized ints. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5720) Default coder breaks with large ints on Python 3

2018-10-12 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5720?focusedWorklogId=153844=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153844
 ]

ASF GitHub Bot logged work on BEAM-5720:


Author: ASF GitHub Bot
Created on: 12/Oct/18 09:35
Start Date: 12/Oct/18 09:35
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #6659: [BEAM-5720] Fix 
encoding of large python ints in Python 3.
URL: https://github.com/apache/beam/pull/6659#issuecomment-429266297
 
 
   Run Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153844)
Time Spent: 40m  (was: 0.5h)

> Default coder breaks with large ints on Python 3
> 
>
> Key: BEAM-5720
> URL: https://issues.apache.org/jira/browse/BEAM-5720
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The test for `int` includes greater than 64-bit values, which causes an 
> overflow error later in the code. We need to only use that coding scheme for 
> machine-sized ints. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5720) Default coder breaks with large ints on Python 3

2018-10-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5720?focusedWorklogId=153694=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153694
 ]

ASF GitHub Bot logged work on BEAM-5720:


Author: ASF GitHub Bot
Created on: 11/Oct/18 20:43
Start Date: 11/Oct/18 20:43
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on a change in pull request #6659: 
[BEAM-5720] Fix encoding of large python ints in Python 3.
URL: https://github.com/apache/beam/pull/6659#discussion_r224597572
 
 

 ##
 File path: sdks/python/apache_beam/coders/coder_impl.py
 ##
 @@ -293,8 +299,21 @@ def encode_to_stream(self, value, stream, nested):
 if value is None:
   stream.write_byte(NONE_TYPE)
 elif t is int:
-  stream.write_byte(INT_TYPE)
-  stream.write_var_int64(value)
+  # In Python 3, an int may be larger than 64 bits.
+  # Note that an OverflowError on stream.write_var_int64 would happen
+  # *after* the marker byte is written, so we must check earlier.
+  try:
+# This may throw an overflow error when compiled.
+int_value = value
+# Otherwise, we must check ourselves.
+if not is_compiled:
+  if not fits_in_64_bits(value):
 
 Review comment:
   Why don't we always check ourselves? Is there a performance hit that 
visualizes in coders microbenchmarks if we add the check?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153694)
Time Spent: 0.5h  (was: 20m)

> Default coder breaks with large ints on Python 3
> 
>
> Key: BEAM-5720
> URL: https://issues.apache.org/jira/browse/BEAM-5720
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The test for `int` includes greater than 64-bit values, which causes an 
> overflow error later in the code. We need to only use that coding scheme for 
> machine-sized ints. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5720) Default coder breaks with large ints on Python 3

2018-10-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5720?focusedWorklogId=153671=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153671
 ]

ASF GitHub Bot logged work on BEAM-5720:


Author: ASF GitHub Bot
Created on: 11/Oct/18 19:55
Start Date: 11/Oct/18 19:55
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #6659: [BEAM-5720] Fix 
encoding of large python ints in Python 3.
URL: https://github.com/apache/beam/pull/6659#issuecomment-429096283
 
 
   R: @tvalentyn 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153671)
Time Spent: 20m  (was: 10m)

> Default coder breaks with large ints on Python 3
> 
>
> Key: BEAM-5720
> URL: https://issues.apache.org/jira/browse/BEAM-5720
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The test for `int` includes greater than 64-bit values, which causes an 
> overflow error later in the code. We need to only use that coding scheme for 
> machine-sized ints. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5720) Default coder breaks with large ints on Python 3

2018-10-11 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5720?focusedWorklogId=153669=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153669
 ]

ASF GitHub Bot logged work on BEAM-5720:


Author: ASF GitHub Bot
Created on: 11/Oct/18 19:55
Start Date: 11/Oct/18 19:55
Worklog Time Spent: 10m 
  Work Description: robertwb opened a new pull request #6659: [BEAM-5720] 
Fix encoding of large python ints in Python 3.
URL: https://github.com/apache/beam/pull/6659
 
 
   
   
   
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone 
(e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
 | --- | --- | --- | --- | --- | ---
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/)
 | --- | --- | ---
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 153669)
Time Spent: 10m
Remaining Estimate: 0h

> Default coder breaks with large ints on Python 3
> 
>
> Key: BEAM-5720
> URL: https://issues.apache.org/jira/browse/BEAM-5720
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Valentyn Tymofieiev