Re: Python errors when using batch+windows+textio

2019-09-16 Thread Paweł Kordek
Hi Kyle

I'm on 2.15. Thanks for pointing me to the JIRA, I'll watch it and also try
to see what's causing the problem.

Best regards
Pawel

On Fri, 13 Sep 2019 at 01:43, Kyle Weaver  wrote:

> Hi Pawel, could you tell us which version of the Beam Python SDK you are
> using?
>
> For the record, this looks like a known issue:
> https://issues.apache.org/jira/browse/BEAM-6860
>
> Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com
>
>
> On Wed, Sep 11, 2019 at 6:33 AM Paweł Kordek 
> wrote:
>
>> Hi
>>
>> I was developing a simple pipeline where I aggregate records by key and
>> sum values for a predefined window. I was getting some errors, and after
>> checking, I am getting exactly the same issues when running Wikipedia
>> example from the Beam repo. The output is as follows:
>> ---
>> INFO:root:Missing pipeline option (runner). Executing pipeline using the
>> default runner: DirectRunner.
>> INFO:root: > at 0x7f333fc1fe60> 
>> INFO:root: > 0x7f333fc1ff80> 
>> INFO:root: > 0x7f333fc1d050> 
>> INFO:root: 
>> 
>> INFO:root: 
>> 
>> INFO:root: 
>> 
>> INFO:root: 
>> 
>> INFO:root: > 0x7f333fc1d3b0> 
>> INFO:root: > 0x7f333fc1d440> 
>> INFO:root: > 0x7f333fc1d5f0> 
>> INFO:root: 
>> 
>> INFO:root: > 0x7f333fc1d710> 
>> INFO:root:Running
>> ((ref_AppliedPTransform_ReadFromText/Read_3)+(ref_AppliedPTransform_ComputeTopSessions/ExtractUserAndTimestamp_5))+(ref_AppliedPTransform_ComputeTopSessions/Filter(> at
>> top_wikipedia_sessions.py:127>)_6))+(ref_AppliedPTransform_ComputeTopSessions/ComputeSessions/ComputeSessionsWindow_8))+(ref_AppliedPTransform_ComputeTopSessions/ComputeSessions/PerElement/PerElement:PairWithVoid_10))+(ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/Precombine))+(ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/Group/Write)
>> INFO:root:Running
>> (((ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/Group/Read)+(ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/Merge))+(ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/ExtractOutputs))+(ref_AppliedPTransform_ComputeTopSessions/SessionsToStrings_18))+(ref_AppliedPTransform_ComputeTopSessions/TopPerMonth/TopPerMonthWindow_20))+(ref_AppliedPTransform_ComputeTopSessions/TopPerMonth/Top/KeyWithVoid_22))+(ComputeTopSessions/TopPerMonth/Top/CombinePerKey/Precombine))+(ComputeTopSessions/TopPerMonth/Top/CombinePerKey/Group/Write)
>> INFO:root:Running
>> (((ref_AppliedPTransform_WriteToText/Write/WriteImpl/DoOnce/Read_36)+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/InitializeWrite_37))+(ref_PCollection_PCollection_19/Write))+(ref_PCollection_PCollection_20/Write)
>> INFO:root:Running
>> ComputeTopSessions/TopPerMonth/Top/CombinePerKey/Group/Read)+(ComputeTopSessions/TopPerMonth/Top/CombinePerKey/Merge))+(ComputeTopSessions/TopPerMonth/Top/CombinePerKey/ExtractOutputs))+(ref_AppliedPTransform_ComputeTopSessions/TopPerMonth/Top/UnKey_30))+(ref_AppliedPTransform_ComputeTopSessions/FormatOutput_31))+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/WriteBundles_38))+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/Pair_39))+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/WindowInto(WindowIntoFn)_40))+(WriteToText/Write/WriteImpl/GroupByKey/Write)
>> Traceback (most recent call last):
>>   File "apache_beam/runners/common.py", line 829, in
>> apache_beam.runners.common.DoFnRunner._invoke_bundle_method
>>   File "apache_beam/runners/common.py", line 403, in
>> apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle
>>   File "apache_beam/runners/common.py", line 406, in
>> apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle
>>   File "apache_beam/runners/common.py", line 982, in
>> apache_beam.runners.common._OutputProcessor.finish_bundle_outputs
>>   File "apache_beam/runners/worker/operations.py", line 142, in
>> apache_beam.runners.worker.operations.SingletonConsumerSet.receive
>>   File "apache_beam/runners/worker/operations.py", line 122, in
>> apache_beam.runners.worker.operations.ConsumerSet.update_counters_start
>>   File "apache_beam/runners/worker/opcounters.py", line 196, in
>> apache_beam.runners.worker.opcounters.OperationCounters.update_from
>>   File "apache_beam/runners/worker/opcounters.py", line 214, in
>> apache_beam.runners.worker.opcounters.OperationCounters.do_sample
>>   File "apache_beam/coders/coder_impl.py", line 1014, in
>> 

Re: Python errors when using batch+windows+textio

2019-09-12 Thread Kyle Weaver
Hi Pawel, could you tell us which version of the Beam Python SDK you are
using?

For the record, this looks like a known issue:
https://issues.apache.org/jira/browse/BEAM-6860

Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com


On Wed, Sep 11, 2019 at 6:33 AM Paweł Kordek 
wrote:

> Hi
>
> I was developing a simple pipeline where I aggregate records by key and
> sum values for a predefined window. I was getting some errors, and after
> checking, I am getting exactly the same issues when running Wikipedia
> example from the Beam repo. The output is as follows:
> ---
> INFO:root:Missing pipeline option (runner). Executing pipeline using the
> default runner: DirectRunner.
> INFO:root:  at 0x7f333fc1fe60> 
> INFO:root:  0x7f333fc1ff80> 
> INFO:root: 
> 
> INFO:root: 
> 
> INFO:root: 
> 
> INFO:root: 
> 
> INFO:root: 
> 
> INFO:root:  0x7f333fc1d3b0> 
> INFO:root:  0x7f333fc1d440> 
> INFO:root:  0x7f333fc1d5f0> 
> INFO:root: 
> 
> INFO:root:  0x7f333fc1d710> 
> INFO:root:Running
> ((ref_AppliedPTransform_ReadFromText/Read_3)+(ref_AppliedPTransform_ComputeTopSessions/ExtractUserAndTimestamp_5))+(ref_AppliedPTransform_ComputeTopSessions/Filter( at
> top_wikipedia_sessions.py:127>)_6))+(ref_AppliedPTransform_ComputeTopSessions/ComputeSessions/ComputeSessionsWindow_8))+(ref_AppliedPTransform_ComputeTopSessions/ComputeSessions/PerElement/PerElement:PairWithVoid_10))+(ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/Precombine))+(ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/Group/Write)
> INFO:root:Running
> (((ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/Group/Read)+(ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/Merge))+(ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/ExtractOutputs))+(ref_AppliedPTransform_ComputeTopSessions/SessionsToStrings_18))+(ref_AppliedPTransform_ComputeTopSessions/TopPerMonth/TopPerMonthWindow_20))+(ref_AppliedPTransform_ComputeTopSessions/TopPerMonth/Top/KeyWithVoid_22))+(ComputeTopSessions/TopPerMonth/Top/CombinePerKey/Precombine))+(ComputeTopSessions/TopPerMonth/Top/CombinePerKey/Group/Write)
> INFO:root:Running
> (((ref_AppliedPTransform_WriteToText/Write/WriteImpl/DoOnce/Read_36)+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/InitializeWrite_37))+(ref_PCollection_PCollection_19/Write))+(ref_PCollection_PCollection_20/Write)
> INFO:root:Running
> ComputeTopSessions/TopPerMonth/Top/CombinePerKey/Group/Read)+(ComputeTopSessions/TopPerMonth/Top/CombinePerKey/Merge))+(ComputeTopSessions/TopPerMonth/Top/CombinePerKey/ExtractOutputs))+(ref_AppliedPTransform_ComputeTopSessions/TopPerMonth/Top/UnKey_30))+(ref_AppliedPTransform_ComputeTopSessions/FormatOutput_31))+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/WriteBundles_38))+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/Pair_39))+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/WindowInto(WindowIntoFn)_40))+(WriteToText/Write/WriteImpl/GroupByKey/Write)
> Traceback (most recent call last):
>   File "apache_beam/runners/common.py", line 829, in
> apache_beam.runners.common.DoFnRunner._invoke_bundle_method
>   File "apache_beam/runners/common.py", line 403, in
> apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle
>   File "apache_beam/runners/common.py", line 406, in
> apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle
>   File "apache_beam/runners/common.py", line 982, in
> apache_beam.runners.common._OutputProcessor.finish_bundle_outputs
>   File "apache_beam/runners/worker/operations.py", line 142, in
> apache_beam.runners.worker.operations.SingletonConsumerSet.receive
>   File "apache_beam/runners/worker/operations.py", line 122, in
> apache_beam.runners.worker.operations.ConsumerSet.update_counters_start
>   File "apache_beam/runners/worker/opcounters.py", line 196, in
> apache_beam.runners.worker.opcounters.OperationCounters.update_from
>   File "apache_beam/runners/worker/opcounters.py", line 214, in
> apache_beam.runners.worker.opcounters.OperationCounters.do_sample
>   File "apache_beam/coders/coder_impl.py", line 1014, in
> apache_beam.coders.coder_impl.WindowedValueCoderImpl.get_estimated_size_and_observables
>   File "apache_beam/coders/coder_impl.py", line 1030, in
> apache_beam.coders.coder_impl.WindowedValueCoderImpl.get_estimated_size_and_observables
>   File "apache_beam/coders/coder_impl.py", line 

Python errors when using batch+windows+textio

2019-09-11 Thread Paweł Kordek
Hi

I was developing a simple pipeline where I aggregate records by key and sum
values for a predefined window. I was getting some errors, and after
checking, I am getting exactly the same issues when running Wikipedia
example from the Beam repo. The output is as follows:
---
INFO:root:Missing pipeline option (runner). Executing pipeline using the
default runner: DirectRunner.
INFO:root:  
INFO:root:  
INFO:root: 

INFO:root: 

INFO:root: 

INFO:root: 

INFO:root: 

INFO:root: 

INFO:root:  
INFO:root:  
INFO:root: 

INFO:root:  
INFO:root:Running
((ref_AppliedPTransform_ReadFromText/Read_3)+(ref_AppliedPTransform_ComputeTopSessions/ExtractUserAndTimestamp_5))+(ref_AppliedPTransform_ComputeTopSessions/Filter()_6))+(ref_AppliedPTransform_ComputeTopSessions/ComputeSessions/ComputeSessionsWindow_8))+(ref_AppliedPTransform_ComputeTopSessions/ComputeSessions/PerElement/PerElement:PairWithVoid_10))+(ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/Precombine))+(ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/Group/Write)
INFO:root:Running
(((ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/Group/Read)+(ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/Merge))+(ComputeTopSessions/ComputeSessions/PerElement/CombinePerKey(CountCombineFn)/ExtractOutputs))+(ref_AppliedPTransform_ComputeTopSessions/SessionsToStrings_18))+(ref_AppliedPTransform_ComputeTopSessions/TopPerMonth/TopPerMonthWindow_20))+(ref_AppliedPTransform_ComputeTopSessions/TopPerMonth/Top/KeyWithVoid_22))+(ComputeTopSessions/TopPerMonth/Top/CombinePerKey/Precombine))+(ComputeTopSessions/TopPerMonth/Top/CombinePerKey/Group/Write)
INFO:root:Running
(((ref_AppliedPTransform_WriteToText/Write/WriteImpl/DoOnce/Read_36)+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/InitializeWrite_37))+(ref_PCollection_PCollection_19/Write))+(ref_PCollection_PCollection_20/Write)
INFO:root:Running
ComputeTopSessions/TopPerMonth/Top/CombinePerKey/Group/Read)+(ComputeTopSessions/TopPerMonth/Top/CombinePerKey/Merge))+(ComputeTopSessions/TopPerMonth/Top/CombinePerKey/ExtractOutputs))+(ref_AppliedPTransform_ComputeTopSessions/TopPerMonth/Top/UnKey_30))+(ref_AppliedPTransform_ComputeTopSessions/FormatOutput_31))+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/WriteBundles_38))+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/Pair_39))+(ref_AppliedPTransform_WriteToText/Write/WriteImpl/WindowInto(WindowIntoFn)_40))+(WriteToText/Write/WriteImpl/GroupByKey/Write)
Traceback (most recent call last):
  File "apache_beam/runners/common.py", line 829, in
apache_beam.runners.common.DoFnRunner._invoke_bundle_method
  File "apache_beam/runners/common.py", line 403, in
apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle
  File "apache_beam/runners/common.py", line 406, in
apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle
  File "apache_beam/runners/common.py", line 982, in
apache_beam.runners.common._OutputProcessor.finish_bundle_outputs
  File "apache_beam/runners/worker/operations.py", line 142, in
apache_beam.runners.worker.operations.SingletonConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 122, in
apache_beam.runners.worker.operations.ConsumerSet.update_counters_start
  File "apache_beam/runners/worker/opcounters.py", line 196, in
apache_beam.runners.worker.opcounters.OperationCounters.update_from
  File "apache_beam/runners/worker/opcounters.py", line 214, in
apache_beam.runners.worker.opcounters.OperationCounters.do_sample
  File "apache_beam/coders/coder_impl.py", line 1014, in
apache_beam.coders.coder_impl.WindowedValueCoderImpl.get_estimated_size_and_observables
  File "apache_beam/coders/coder_impl.py", line 1030, in
apache_beam.coders.coder_impl.WindowedValueCoderImpl.get_estimated_size_and_observables
  File "apache_beam/coders/coder_impl.py", line 814, in
apache_beam.coders.coder_impl.SequenceCoderImpl.estimate_size
  File "apache_beam/coders/coder_impl.py", line 828, in
apache_beam.coders.coder_impl.SequenceCoderImpl.get_estimated_size_and_observables
  File "apache_beam/coders/coder_impl.py", line 145, in
apache_beam.coders.coder_impl.CoderImpl.get_estimated_size_and_observables
  File "apache_beam/coders/coder_impl.py", line 494, in
apache_beam.coders.coder_impl.IntervalWindowCoderImpl.estimate_size
TypeError: Cannot convert GlobalWindow to