Re: Question about SplittableDoFn

2021-05-19 Thread Boyuan Zhang
Thanks for sharing. I'll comment on the PR.

On Tue, May 18, 2021 at 3:44 PM Miguel Anzo Palomo 
wrote:

> Boyuan Zhang, It's about this issue
> , the code can be found
> here 
>
> On Tue, May 18, 2021 at 5:29 PM Boyuan Zhang  wrote:
>
>> Would you like to share your draft code? Iterating on the code might be
>> easier to figure out the issue.
>>
>> On Tue, May 18, 2021 at 3:28 PM Robert Burke  wrote:
>>
>>> IIRC the Initial Restrictions method gives you an element and you return
>>> the restrictions relative to that element.
>>>
>>> It's entirely appropriate to stat files or query databases in order to
>>> determine the initial restrictions and partitions of the data.
>>>
>>>
>>> On Tue, May 18, 2021, 3:21 PM Miguel Anzo Palomo <
>>> miguel.a...@wizeline.com> wrote:
>>>
 Hi, I’m looking at how to implement a reader as a SplittableDoFn and
 I'm having some problems with the initial restriction, specifically, how do
 you set the initial restriction if you don’t know the size of the data?
 The DoFn that I'm working on takes a PCollection of Spanner *ReadOperations
 *and splits the read operation query into a list of *Partitions* to
 query against the database.
 I’m currently setting the *InitialRestriction* to an OffsetRange(0L,
 Long.MAX_VALUE); which is currently giving me this error in unit tests Last
 attempted offset was 0 in range [0, 9223372036854775807), claiming work in
 [1, 9223372036854775807) was not attempted. and it makes sense I
 think, because I am setting up the range to max long value.
 So, if I don't know how many partitions are going to be created until
 it's being processed, how can I set the initial restriction or what initial
 restriction do I need to set?

 --

 Miguel Angel Anzo Palomo | WIZELINE

 Software Engineer

 miguel.a...@wizeline.com

 Remote Office








 *This email and its contents (including any attachments) are being sent
 toyou on the condition of confidentiality and may be protected by
 legalprivilege. Access to this email by anyone other than the intended
 recipientis unauthorized. If you are not the intended recipient, please
 immediatelynotify the sender by replying to this message and delete the
 materialimmediately from your system. Any further use, dissemination,
 distributionor reproduction of this email is strictly prohibited. Further,
 norepresentation is made with respect to any content contained in this
 email.*
>>>
>>>
>
> --
>
> Miguel Angel Anzo Palomo | WIZELINE
>
> Software Engineer
>
> miguel.a...@wizeline.com
>
> Remote Office
>
>
>
>
>
>
>
>
> *This email and its contents (including any attachments) are being sent
> toyou on the condition of confidentiality and may be protected by
> legalprivilege. Access to this email by anyone other than the intended
> recipientis unauthorized. If you are not the intended recipient, please
> immediatelynotify the sender by replying to this message and delete the
> materialimmediately from your system. Any further use, dissemination,
> distributionor reproduction of this email is strictly prohibited. Further,
> norepresentation is made with respect to any content contained in this
> email.*


Re: Question about SplittableDoFn

2021-05-18 Thread Miguel Anzo Palomo
Boyuan Zhang, It's about this issue
, the code can be found
here 

On Tue, May 18, 2021 at 5:29 PM Boyuan Zhang  wrote:

> Would you like to share your draft code? Iterating on the code might be
> easier to figure out the issue.
>
> On Tue, May 18, 2021 at 3:28 PM Robert Burke  wrote:
>
>> IIRC the Initial Restrictions method gives you an element and you return
>> the restrictions relative to that element.
>>
>> It's entirely appropriate to stat files or query databases in order to
>> determine the initial restrictions and partitions of the data.
>>
>>
>> On Tue, May 18, 2021, 3:21 PM Miguel Anzo Palomo <
>> miguel.a...@wizeline.com> wrote:
>>
>>> Hi, I’m looking at how to implement a reader as a SplittableDoFn and I'm
>>> having some problems with the initial restriction, specifically, how do you
>>> set the initial restriction if you don’t know the size of the data?
>>> The DoFn that I'm working on takes a PCollection of Spanner *ReadOperations
>>> *and splits the read operation query into a list of *Partitions* to
>>> query against the database.
>>> I’m currently setting the *InitialRestriction* to an OffsetRange(0L,
>>> Long.MAX_VALUE); which is currently giving me this error in unit tests Last
>>> attempted offset was 0 in range [0, 9223372036854775807), claiming work in
>>> [1, 9223372036854775807) was not attempted. and it makes sense I think,
>>> because I am setting up the range to max long value.
>>> So, if I don't know how many partitions are going to be created until
>>> it's being processed, how can I set the initial restriction or what initial
>>> restriction do I need to set?
>>>
>>> --
>>>
>>> Miguel Angel Anzo Palomo | WIZELINE
>>>
>>> Software Engineer
>>>
>>> miguel.a...@wizeline.com
>>>
>>> Remote Office
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *This email and its contents (including any attachments) are being sent
>>> toyou on the condition of confidentiality and may be protected by
>>> legalprivilege. Access to this email by anyone other than the intended
>>> recipientis unauthorized. If you are not the intended recipient, please
>>> immediatelynotify the sender by replying to this message and delete the
>>> materialimmediately from your system. Any further use, dissemination,
>>> distributionor reproduction of this email is strictly prohibited. Further,
>>> norepresentation is made with respect to any content contained in this
>>> email.*
>>
>>

-- 

Miguel Angel Anzo Palomo | WIZELINE

Software Engineer

miguel.a...@wizeline.com

Remote Office

-- 
*This email and its contents (including any attachments) are being sent to
you on the condition of confidentiality and may be protected by legal
privilege. Access to this email by anyone other than the intended recipient
is unauthorized. If you are not the intended recipient, please immediately
notify the sender by replying to this message and delete the material
immediately from your system. Any further use, dissemination, distribution
or reproduction of this email is strictly prohibited. Further, no
representation is made with respect to any content contained in this email.*


Re: Question about SplittableDoFn

2021-05-18 Thread Boyuan Zhang
Would you like to share your draft code? Iterating on the code might be
easier to figure out the issue.

On Tue, May 18, 2021 at 3:28 PM Robert Burke  wrote:

> IIRC the Initial Restrictions method gives you an element and you return
> the restrictions relative to that element.
>
> It's entirely appropriate to stat files or query databases in order to
> determine the initial restrictions and partitions of the data.
>
>
> On Tue, May 18, 2021, 3:21 PM Miguel Anzo Palomo 
> wrote:
>
>> Hi, I’m looking at how to implement a reader as a SplittableDoFn and I'm
>> having some problems with the initial restriction, specifically, how do you
>> set the initial restriction if you don’t know the size of the data?
>> The DoFn that I'm working on takes a PCollection of Spanner *ReadOperations
>> *and splits the read operation query into a list of *Partitions* to
>> query against the database.
>> I’m currently setting the *InitialRestriction* to an OffsetRange(0L,
>> Long.MAX_VALUE); which is currently giving me this error in unit tests Last
>> attempted offset was 0 in range [0, 9223372036854775807), claiming work in
>> [1, 9223372036854775807) was not attempted. and it makes sense I think,
>> because I am setting up the range to max long value.
>> So, if I don't know how many partitions are going to be created until
>> it's being processed, how can I set the initial restriction or what initial
>> restriction do I need to set?
>>
>> --
>>
>> Miguel Angel Anzo Palomo | WIZELINE
>>
>> Software Engineer
>>
>> miguel.a...@wizeline.com
>>
>> Remote Office
>>
>>
>>
>>
>>
>>
>>
>>
>> *This email and its contents (including any attachments) are being sent
>> toyou on the condition of confidentiality and may be protected by
>> legalprivilege. Access to this email by anyone other than the intended
>> recipientis unauthorized. If you are not the intended recipient, please
>> immediatelynotify the sender by replying to this message and delete the
>> materialimmediately from your system. Any further use, dissemination,
>> distributionor reproduction of this email is strictly prohibited. Further,
>> norepresentation is made with respect to any content contained in this
>> email.*
>
>


Re: Question about SplittableDoFn

2021-05-18 Thread Robert Burke
IIRC the Initial Restrictions method gives you an element and you return
the restrictions relative to that element.

It's entirely appropriate to stat files or query databases in order to
determine the initial restrictions and partitions of the data.


On Tue, May 18, 2021, 3:21 PM Miguel Anzo Palomo 
wrote:

> Hi, I’m looking at how to implement a reader as a SplittableDoFn and I'm
> having some problems with the initial restriction, specifically, how do you
> set the initial restriction if you don’t know the size of the data?
> The DoFn that I'm working on takes a PCollection of Spanner *ReadOperations
> *and splits the read operation query into a list of *Partitions* to query
> against the database.
> I’m currently setting the *InitialRestriction* to an OffsetRange(0L,
> Long.MAX_VALUE); which is currently giving me this error in unit tests Last
> attempted offset was 0 in range [0, 9223372036854775807), claiming work in
> [1, 9223372036854775807) was not attempted. and it makes sense I think,
> because I am setting up the range to max long value.
> So, if I don't know how many partitions are going to be created until it's
> being processed, how can I set the initial restriction or what initial
> restriction do I need to set?
>
> --
>
> Miguel Angel Anzo Palomo | WIZELINE
>
> Software Engineer
>
> miguel.a...@wizeline.com
>
> Remote Office
>
>
>
>
>
>
>
>
> *This email and its contents (including any attachments) are being sent
> toyou on the condition of confidentiality and may be protected by
> legalprivilege. Access to this email by anyone other than the intended
> recipientis unauthorized. If you are not the intended recipient, please
> immediatelynotify the sender by replying to this message and delete the
> materialimmediately from your system. Any further use, dissemination,
> distributionor reproduction of this email is strictly prohibited. Further,
> norepresentation is made with respect to any content contained in this
> email.*