[jira] [Work logged] (BEAM-8271) StateGetRequest/Response continuation_token should be string

2020-02-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8271?focusedWorklogId=382622&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382622
 ]

ASF GitHub Bot logged work on BEAM-8271:


Author: ASF GitHub Bot
Created on: 06/Feb/20 00:23
Start Date: 06/Feb/20 00:23
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10595: [BEAM-8271] 
Properly encode/decode StateGetRequest/Response continuation token
URL: https://github.com/apache/beam/pull/10595
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 382622)
Time Spent: 1.5h  (was: 1h 20m)

> StateGetRequest/Response continuation_token should be string
> 
>
> Key: BEAM-8271
> URL: https://issues.apache.org/jira/browse/BEAM-8271
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> I've been working on adding typing to the python code and I came across a 
> discrepancy between regarding the type of the continuation token.  The .proto 
> defines it as bytes, but the code treats it as a string (i.e. unicode):
>  
> {code:java}
> // A request to get state.
> message StateGetRequest {
>   // (Optional) If specified, signals to the runner that the response
>   // should resume from the following continuation token.
>   //
>   // If unspecified, signals to the runner that the response should start
>   // from the beginning of the logical continuable stream.
>   bytes continuation_token = 1;
> }
> // A response to get state representing a logical byte stream which can be
> // continued using the state API.
> message StateGetResponse {
>   // (Optional) If specified, represents a token which can be used with the
>   // state API to get the next chunk of this logical byte stream. The end of
>   // the logical byte stream is signalled by this field being unset.
>   bytes continuation_token = 1;
>   // Represents a part of a logical byte stream. Elements within
>   // the logical byte stream are encoded in the nested context and
>   // concatenated together.
>   bytes data = 2;
> } 
> {code}
> From FnApiRunner.StateServicer:
> {code:python}
> def blocking_get(self, state_key, continuation_token=None):
>   with self._lock:
> full_state = self._state[self._to_key(state_key)]
> if self._use_continuation_tokens:
>   # The token is "nonce:index".
>   if not continuation_token:
> token_base = 'token_%x' % len(self._continuations)
> self._continuations[token_base] = tuple(full_state)
> return b'', '%s:0' % token_base
>   else:
> token_base, index = continuation_token.split(':')
> ix = int(index)
> full_state = self._continuations[token_base]
> if ix == len(full_state):
>   return b'', None
> else:
>   return full_state[ix], '%s:%d' % (token_base, ix + 1)
> else:
>   assert not continuation_token
>   return b''.join(full_state), None
> {code}
> This could be a problem in python3.  
> All other id values are string, whereas bytes is reserved for data, so I 
> think that the proto should be changed to string. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8271) StateGetRequest/Response continuation_token should be string

2020-02-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8271?focusedWorklogId=381698&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-381698
 ]

ASF GitHub Bot logged work on BEAM-8271:


Author: ASF GitHub Bot
Created on: 04/Feb/20 16:15
Start Date: 04/Feb/20 16:15
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #10595: [BEAM-8271] Properly 
encode/decode StateGetRequest/Response continuation token
URL: https://github.com/apache/beam/pull/10595#issuecomment-581987427
 
 
   @robertwb this is ready to go
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 381698)
Time Spent: 1h 20m  (was: 1h 10m)

> StateGetRequest/Response continuation_token should be string
> 
>
> Key: BEAM-8271
> URL: https://issues.apache.org/jira/browse/BEAM-8271
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I've been working on adding typing to the python code and I came across a 
> discrepancy between regarding the type of the continuation token.  The .proto 
> defines it as bytes, but the code treats it as a string (i.e. unicode):
>  
> {code:java}
> // A request to get state.
> message StateGetRequest {
>   // (Optional) If specified, signals to the runner that the response
>   // should resume from the following continuation token.
>   //
>   // If unspecified, signals to the runner that the response should start
>   // from the beginning of the logical continuable stream.
>   bytes continuation_token = 1;
> }
> // A response to get state representing a logical byte stream which can be
> // continued using the state API.
> message StateGetResponse {
>   // (Optional) If specified, represents a token which can be used with the
>   // state API to get the next chunk of this logical byte stream. The end of
>   // the logical byte stream is signalled by this field being unset.
>   bytes continuation_token = 1;
>   // Represents a part of a logical byte stream. Elements within
>   // the logical byte stream are encoded in the nested context and
>   // concatenated together.
>   bytes data = 2;
> } 
> {code}
> From FnApiRunner.StateServicer:
> {code:python}
> def blocking_get(self, state_key, continuation_token=None):
>   with self._lock:
> full_state = self._state[self._to_key(state_key)]
> if self._use_continuation_tokens:
>   # The token is "nonce:index".
>   if not continuation_token:
> token_base = 'token_%x' % len(self._continuations)
> self._continuations[token_base] = tuple(full_state)
> return b'', '%s:0' % token_base
>   else:
> token_base, index = continuation_token.split(':')
> ix = int(index)
> full_state = self._continuations[token_base]
> if ix == len(full_state):
>   return b'', None
> else:
>   return full_state[ix], '%s:%d' % (token_base, ix + 1)
> else:
>   assert not continuation_token
>   return b''.join(full_state), None
> {code}
> This could be a problem in python3.  
> All other id values are string, whereas bytes is reserved for data, so I 
> think that the proto should be changed to string. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8271) StateGetRequest/Response continuation_token should be string

2020-01-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8271?focusedWorklogId=379548&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-379548
 ]

ASF GitHub Bot logged work on BEAM-8271:


Author: ASF GitHub Bot
Created on: 30/Jan/20 19:06
Start Date: 30/Jan/20 19:06
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #10595: [BEAM-8271] Properly 
encode/decode StateGetRequest/Response continuation token
URL: https://github.com/apache/beam/pull/10595#issuecomment-580405861
 
 
   > It's bytes because non-trivial runners may serialize arbitrary data into 
this field used to continue the iterable. (Ids are strings because they're just 
used to compare against, and also the type of proto map keys constrains us 
here.) 
   
   Make sense.  Thanks for the explanation. 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 379548)
Time Spent: 1h 10m  (was: 1h)

> StateGetRequest/Response continuation_token should be string
> 
>
> Key: BEAM-8271
> URL: https://issues.apache.org/jira/browse/BEAM-8271
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I've been working on adding typing to the python code and I came across a 
> discrepancy between regarding the type of the continuation token.  The .proto 
> defines it as bytes, but the code treats it as a string (i.e. unicode):
>  
> {code:java}
> // A request to get state.
> message StateGetRequest {
>   // (Optional) If specified, signals to the runner that the response
>   // should resume from the following continuation token.
>   //
>   // If unspecified, signals to the runner that the response should start
>   // from the beginning of the logical continuable stream.
>   bytes continuation_token = 1;
> }
> // A response to get state representing a logical byte stream which can be
> // continued using the state API.
> message StateGetResponse {
>   // (Optional) If specified, represents a token which can be used with the
>   // state API to get the next chunk of this logical byte stream. The end of
>   // the logical byte stream is signalled by this field being unset.
>   bytes continuation_token = 1;
>   // Represents a part of a logical byte stream. Elements within
>   // the logical byte stream are encoded in the nested context and
>   // concatenated together.
>   bytes data = 2;
> } 
> {code}
> From FnApiRunner.StateServicer:
> {code:python}
> def blocking_get(self, state_key, continuation_token=None):
>   with self._lock:
> full_state = self._state[self._to_key(state_key)]
> if self._use_continuation_tokens:
>   # The token is "nonce:index".
>   if not continuation_token:
> token_base = 'token_%x' % len(self._continuations)
> self._continuations[token_base] = tuple(full_state)
> return b'', '%s:0' % token_base
>   else:
> token_base, index = continuation_token.split(':')
> ix = int(index)
> full_state = self._continuations[token_base]
> if ix == len(full_state):
>   return b'', None
> else:
>   return full_state[ix], '%s:%d' % (token_base, ix + 1)
> else:
>   assert not continuation_token
>   return b''.join(full_state), None
> {code}
> This could be a problem in python3.  
> All other id values are string, whereas bytes is reserved for data, so I 
> think that the proto should be changed to string. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8271) StateGetRequest/Response continuation_token should be string

2020-01-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8271?focusedWorklogId=379547&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-379547
 ]

ASF GitHub Bot logged work on BEAM-8271:


Author: ASF GitHub Bot
Created on: 30/Jan/20 19:05
Start Date: 30/Jan/20 19:05
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #10595: [BEAM-8271] Properly 
encode/decode StateGetRequest/Response continuation token
URL: https://github.com/apache/beam/pull/10595#issuecomment-580405533
 
 
   @robertwb  Ok, the solution using `bytes` is actually much simpler.  I 
thought that `bytes` had all of its string formatting capabilities neutered in 
python3, but apparently it has not (my python3 experience is limited to a few 
side projects).Thanks for pushing me in the right direction.
   
   Note: I can't see the tests on this PR.  Are they running?
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 379547)
Time Spent: 1h  (was: 50m)

> StateGetRequest/Response continuation_token should be string
> 
>
> Key: BEAM-8271
> URL: https://issues.apache.org/jira/browse/BEAM-8271
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I've been working on adding typing to the python code and I came across a 
> discrepancy between regarding the type of the continuation token.  The .proto 
> defines it as bytes, but the code treats it as a string (i.e. unicode):
>  
> {code:java}
> // A request to get state.
> message StateGetRequest {
>   // (Optional) If specified, signals to the runner that the response
>   // should resume from the following continuation token.
>   //
>   // If unspecified, signals to the runner that the response should start
>   // from the beginning of the logical continuable stream.
>   bytes continuation_token = 1;
> }
> // A response to get state representing a logical byte stream which can be
> // continued using the state API.
> message StateGetResponse {
>   // (Optional) If specified, represents a token which can be used with the
>   // state API to get the next chunk of this logical byte stream. The end of
>   // the logical byte stream is signalled by this field being unset.
>   bytes continuation_token = 1;
>   // Represents a part of a logical byte stream. Elements within
>   // the logical byte stream are encoded in the nested context and
>   // concatenated together.
>   bytes data = 2;
> } 
> {code}
> From FnApiRunner.StateServicer:
> {code:python}
> def blocking_get(self, state_key, continuation_token=None):
>   with self._lock:
> full_state = self._state[self._to_key(state_key)]
> if self._use_continuation_tokens:
>   # The token is "nonce:index".
>   if not continuation_token:
> token_base = 'token_%x' % len(self._continuations)
> self._continuations[token_base] = tuple(full_state)
> return b'', '%s:0' % token_base
>   else:
> token_base, index = continuation_token.split(':')
> ix = int(index)
> full_state = self._continuations[token_base]
> if ix == len(full_state):
>   return b'', None
> else:
>   return full_state[ix], '%s:%d' % (token_base, ix + 1)
> else:
>   assert not continuation_token
>   return b''.join(full_state), None
> {code}
> This could be a problem in python3.  
> All other id values are string, whereas bytes is reserved for data, so I 
> think that the proto should be changed to string. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8271) StateGetRequest/Response continuation_token should be string

2020-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8271?focusedWorklogId=379117&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-379117
 ]

ASF GitHub Bot logged work on BEAM-8271:


Author: ASF GitHub Bot
Created on: 30/Jan/20 01:28
Start Date: 30/Jan/20 01:28
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #10595: [BEAM-8271] Properly 
encode/decode StateGetRequest/Response continuation token
URL: https://github.com/apache/beam/pull/10595#issuecomment-580042571
 
 
   Ok, I’ll have another look at it.
   
   On Wed, Jan 29, 2020 at 5:06 PM Robert Bradshaw 
   wrote:
   
   > It's bytes because non-trivial runners may serialize arbitrary data into
   > this field used to continue the iterable. (Ids are strings because they're
   > just used to compare against, and also the type of proto map keys
   > constrains us here.) There shouldn't be any manipulation of this token
   > except for passing it back on the client side at least--just accepting it
   > and passing it back.
   >
   > It looks like Java treats this as a BytesString. I still think we should
   > do the same.
   >
   > —
   > You are receiving this because you authored the thread.
   > Reply to this email directly, view it on GitHub
   > 
,
   > or unsubscribe
   > 

   > .
   >
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 379117)
Time Spent: 50m  (was: 40m)

> StateGetRequest/Response continuation_token should be string
> 
>
> Key: BEAM-8271
> URL: https://issues.apache.org/jira/browse/BEAM-8271
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I've been working on adding typing to the python code and I came across a 
> discrepancy between regarding the type of the continuation token.  The .proto 
> defines it as bytes, but the code treats it as a string (i.e. unicode):
>  
> {code:java}
> // A request to get state.
> message StateGetRequest {
>   // (Optional) If specified, signals to the runner that the response
>   // should resume from the following continuation token.
>   //
>   // If unspecified, signals to the runner that the response should start
>   // from the beginning of the logical continuable stream.
>   bytes continuation_token = 1;
> }
> // A response to get state representing a logical byte stream which can be
> // continued using the state API.
> message StateGetResponse {
>   // (Optional) If specified, represents a token which can be used with the
>   // state API to get the next chunk of this logical byte stream. The end of
>   // the logical byte stream is signalled by this field being unset.
>   bytes continuation_token = 1;
>   // Represents a part of a logical byte stream. Elements within
>   // the logical byte stream are encoded in the nested context and
>   // concatenated together.
>   bytes data = 2;
> } 
> {code}
> From FnApiRunner.StateServicer:
> {code:python}
> def blocking_get(self, state_key, continuation_token=None):
>   with self._lock:
> full_state = self._state[self._to_key(state_key)]
> if self._use_continuation_tokens:
>   # The token is "nonce:index".
>   if not continuation_token:
> token_base = 'token_%x' % len(self._continuations)
> self._continuations[token_base] = tuple(full_state)
> return b'', '%s:0' % token_base
>   else:
> token_base, index = continuation_token.split(':')
> ix = int(index)
> full_state = self._continuations[token_base]
> if ix == len(full_state):
>   return b'', None
> else:
>   return full_state[ix], '%s:%d' % (token_base, ix + 1)
> else:
>   assert not continuation_token
>   return b''.join(full_state), None
> {code}
> This could be a problem in python3.  
> All other id values are string, whereas bytes is reserved for data, so I 
> think that the proto should be changed to string. 



--
This mess

[jira] [Work logged] (BEAM-8271) StateGetRequest/Response continuation_token should be string

2020-01-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8271?focusedWorklogId=379108&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-379108
 ]

ASF GitHub Bot logged work on BEAM-8271:


Author: ASF GitHub Bot
Created on: 30/Jan/20 01:06
Start Date: 30/Jan/20 01:06
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10595: [BEAM-8271] 
Properly encode/decode StateGetRequest/Response continuation token
URL: https://github.com/apache/beam/pull/10595#issuecomment-580037170
 
 
   It's bytes because non-trivial runners may serialize arbitrary data into 
this field used to continue the iterable. (Ids are strings because they're just 
used to compare against, and also the type of proto map keys constrains us 
here.) There shouldn't be any manipulation of this token except for passing it 
back on the client side at least--just accepting it and passing it back. 
   
   It looks like Java treats this as a BytesString. I still think we should do 
the same. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 379108)
Time Spent: 40m  (was: 0.5h)

> StateGetRequest/Response continuation_token should be string
> 
>
> Key: BEAM-8271
> URL: https://issues.apache.org/jira/browse/BEAM-8271
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I've been working on adding typing to the python code and I came across a 
> discrepancy between regarding the type of the continuation token.  The .proto 
> defines it as bytes, but the code treats it as a string (i.e. unicode):
>  
> {code:java}
> // A request to get state.
> message StateGetRequest {
>   // (Optional) If specified, signals to the runner that the response
>   // should resume from the following continuation token.
>   //
>   // If unspecified, signals to the runner that the response should start
>   // from the beginning of the logical continuable stream.
>   bytes continuation_token = 1;
> }
> // A response to get state representing a logical byte stream which can be
> // continued using the state API.
> message StateGetResponse {
>   // (Optional) If specified, represents a token which can be used with the
>   // state API to get the next chunk of this logical byte stream. The end of
>   // the logical byte stream is signalled by this field being unset.
>   bytes continuation_token = 1;
>   // Represents a part of a logical byte stream. Elements within
>   // the logical byte stream are encoded in the nested context and
>   // concatenated together.
>   bytes data = 2;
> } 
> {code}
> From FnApiRunner.StateServicer:
> {code:python}
> def blocking_get(self, state_key, continuation_token=None):
>   with self._lock:
> full_state = self._state[self._to_key(state_key)]
> if self._use_continuation_tokens:
>   # The token is "nonce:index".
>   if not continuation_token:
> token_base = 'token_%x' % len(self._continuations)
> self._continuations[token_base] = tuple(full_state)
> return b'', '%s:0' % token_base
>   else:
> token_base, index = continuation_token.split(':')
> ix = int(index)
> full_state = self._continuations[token_base]
> if ix == len(full_state):
>   return b'', None
> else:
>   return full_state[ix], '%s:%d' % (token_base, ix + 1)
> else:
>   assert not continuation_token
>   return b''.join(full_state), None
> {code}
> This could be a problem in python3.  
> All other id values are string, whereas bytes is reserved for data, so I 
> think that the proto should be changed to string. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8271) StateGetRequest/Response continuation_token should be string

2020-01-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8271?focusedWorklogId=375939&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375939
 ]

ASF GitHub Bot logged work on BEAM-8271:


Author: ASF GitHub Bot
Created on: 22/Jan/20 22:27
Start Date: 22/Jan/20 22:27
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #10595: [BEAM-8271] Properly 
encode/decode StateGetRequest/Response continuation token
URL: https://github.com/apache/beam/pull/10595#issuecomment-577415527
 
 
   > Actually, I think it's more correct to let continuation_token by bytes 
everywhere (possibly fixing the uses if need be).
   
   Here's the ticket I made: https://issues.apache.org/jira/browse/BEAM-8271.  
   
   It's been awhile, but my instinct was that this is an issue in the proto 
file:  all other id-like fields throughout these protos are strings.  Also, the 
token itself is created using string operations (`'token_%x' % 
len(self._continuations)`) so it certainly feels like this wants to be a string 
right up until it's encoded and stored in the proto message.   
   
   IIRC, my first attempt to solve this was to preserve the `bytes` typing, but 
I think there are other calls that create the continuation token from strings, 
and it started getting pretty ugly (converting from bytes to str to do some 
string formatting, then back to bytes in multiple places, etc), so I looked at 
how this was solved in Java, and based my changes here on that.  It keeps the 
ugliness isolated and at a minimum (but again, it was awhile ago, so I may be 
misremembering something). 
   
   If I haven't convinced you yet, I can show you what an alternate solution 
looks like using bytes, but first please do have a look at other ids in the 
.proto files and at the Java equivalent for this.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375939)
Time Spent: 0.5h  (was: 20m)

> StateGetRequest/Response continuation_token should be string
> 
>
> Key: BEAM-8271
> URL: https://issues.apache.org/jira/browse/BEAM-8271
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I've been working on adding typing to the python code and I came across a 
> discrepancy between regarding the type of the continuation token.  The .proto 
> defines it as bytes, but the code treats it as a string (i.e. unicode):
>  
> {code:java}
> // A request to get state.
> message StateGetRequest {
>   // (Optional) If specified, signals to the runner that the response
>   // should resume from the following continuation token.
>   //
>   // If unspecified, signals to the runner that the response should start
>   // from the beginning of the logical continuable stream.
>   bytes continuation_token = 1;
> }
> // A response to get state representing a logical byte stream which can be
> // continued using the state API.
> message StateGetResponse {
>   // (Optional) If specified, represents a token which can be used with the
>   // state API to get the next chunk of this logical byte stream. The end of
>   // the logical byte stream is signalled by this field being unset.
>   bytes continuation_token = 1;
>   // Represents a part of a logical byte stream. Elements within
>   // the logical byte stream are encoded in the nested context and
>   // concatenated together.
>   bytes data = 2;
> } 
> {code}
> From FnApiRunner.StateServicer:
> {code:python}
> def blocking_get(self, state_key, continuation_token=None):
>   with self._lock:
> full_state = self._state[self._to_key(state_key)]
> if self._use_continuation_tokens:
>   # The token is "nonce:index".
>   if not continuation_token:
> token_base = 'token_%x' % len(self._continuations)
> self._continuations[token_base] = tuple(full_state)
> return b'', '%s:0' % token_base
>   else:
> token_base, index = continuation_token.split(':')
> ix = int(index)
> full_state = self._continuations[token_base]
> if ix == len(full_state):
>   return b'', None
> else:
>   return full_state[ix], '%s:%d' % (token_base, ix + 1)
> else:
>   assert not continuation_token
>   return b''.join(full_state

[jira] [Work logged] (BEAM-8271) StateGetRequest/Response continuation_token should be string

2020-01-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8271?focusedWorklogId=375938&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-375938
 ]

ASF GitHub Bot logged work on BEAM-8271:


Author: ASF GitHub Bot
Created on: 22/Jan/20 22:26
Start Date: 22/Jan/20 22:26
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #10595: [BEAM-8271] Properly 
encode/decode StateGetRequest/Response continuation token
URL: https://github.com/apache/beam/pull/10595#issuecomment-577415527
 
 
   > Actually, I think it's more correct to let continuation_token by bytes 
everywhere (possibly fixing the uses if need be).
   
   Here's the ticket I made: https://issues.apache.org/jira/browse/BEAM-8271.  
   
   It's been awhile, but my instinct was that this is an issue in the proto 
file:  all other id-like fields throughout these protos are strings.  Also, the 
token itself is created using string operations (`'token_%x' % 
len(self._continuations)`) so it certainly feels like this wants to be a string 
right up until it's encoded and stored in the proto message.   
   
   IIRC, my first attempt to solve this was to preserve the `bytes` typing, but 
I think there are other calls that create the continuation token from strings, 
and it started getting pretty ugly (converting from bytes to str to do some 
string formatting, then back to bytes in multiple places, etc), so I looked at 
how this was solved in Java, and based my changes here on that.  It keeps the 
ugliness isolated and at a minimum.  
   
   Again, it was awhile ago, so I may be misremembering something.  If I 
haven't convinced you yet, I can show you what an alternate solution looks like 
using bytes, but first please do have a look at other ids in the .proto files 
and at the Java equivalent for this.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 375938)
Time Spent: 20m  (was: 10m)

> StateGetRequest/Response continuation_token should be string
> 
>
> Key: BEAM-8271
> URL: https://issues.apache.org/jira/browse/BEAM-8271
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I've been working on adding typing to the python code and I came across a 
> discrepancy between regarding the type of the continuation token.  The .proto 
> defines it as bytes, but the code treats it as a string (i.e. unicode):
>  
> {code:java}
> // A request to get state.
> message StateGetRequest {
>   // (Optional) If specified, signals to the runner that the response
>   // should resume from the following continuation token.
>   //
>   // If unspecified, signals to the runner that the response should start
>   // from the beginning of the logical continuable stream.
>   bytes continuation_token = 1;
> }
> // A response to get state representing a logical byte stream which can be
> // continued using the state API.
> message StateGetResponse {
>   // (Optional) If specified, represents a token which can be used with the
>   // state API to get the next chunk of this logical byte stream. The end of
>   // the logical byte stream is signalled by this field being unset.
>   bytes continuation_token = 1;
>   // Represents a part of a logical byte stream. Elements within
>   // the logical byte stream are encoded in the nested context and
>   // concatenated together.
>   bytes data = 2;
> } 
> {code}
> From FnApiRunner.StateServicer:
> {code:python}
> def blocking_get(self, state_key, continuation_token=None):
>   with self._lock:
> full_state = self._state[self._to_key(state_key)]
> if self._use_continuation_tokens:
>   # The token is "nonce:index".
>   if not continuation_token:
> token_base = 'token_%x' % len(self._continuations)
> self._continuations[token_base] = tuple(full_state)
> return b'', '%s:0' % token_base
>   else:
> token_base, index = continuation_token.split(':')
> ix = int(index)
> full_state = self._continuations[token_base]
> if ix == len(full_state):
>   return b'', None
> else:
>   return full_state[ix], '%s:%d' % (token_base, ix + 1)
> else:
>   assert not continuation_token
>   return b''.join(full_state), No

[jira] [Work logged] (BEAM-8271) StateGetRequest/Response continuation_token should be string

2020-01-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8271?focusedWorklogId=372031&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-372031
 ]

ASF GitHub Bot logged work on BEAM-8271:


Author: ASF GitHub Bot
Created on: 15/Jan/20 00:13
Start Date: 15/Jan/20 00:13
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10595: [BEAM-8271] 
Properly encode/decode StateGetRequest/Response continuation token
URL: https://github.com/apache/beam/pull/10595
 
 
   This behavior is the same as Java, and corrects an inconsistency with 
bytes/str that might have caused problems in python3.
   
   This is a followup to #9056.
   
   Note that this issue has its own Jira. 
   
   R: @robertwb 
   R: @udim 
   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](htt