It's due to the size of the JSON serialized Dataflow pipeline (number of
transforms and serialized size of these transforms).

On Wed, Jan 10, 2018 at 11:40 PM Unais Thachuparambil <
[email protected]> wrote:

> Is it because of my output I'm participating writing to 180 partitions? Or
> because of more pipeline operations & transforms
>
> On Thu, Jan 11, 2018 at 10:48 AM, Chamikara Jayalath <[email protected]
> > wrote:
>
>> Dataflow service has a 10MB request size limit. Seems like you are
>> hitting this. See following for more information regarding this.
>> https://cloud.google.com/dataflow/pipelines/troubleshooting-your-pipeline
>>
>> Looks like your are hitting this due to number of partitions. I don't
>> think currently there's a good solution other than to execute multiple
>> jobs. We hope to introduce dynamic destinations feature to Python BQ sink
>> in the near future which will allow you to write this using a more compact
>> pipeline.
>>
>> Thanks,
>> Cham
>>
>>
>> On Wed, Jan 10, 2018 at 10:22 PM Unais Thachuparambil <
>> [email protected]> wrote:
>>
>>> I wrote a python dataflow job to read data from biqquery and do some
>>> transform and save the result as bq table..
>>>
>>> I tested with 8 days data it works fine - when I scaled to 180 days I’m
>>> getting the below error
>>>
>>> ```"message": "Request payload size exceeds the limit: 10485760
>>> bytes.",```
>>>
>>>
>>> ```pitools.base.py.exceptions.HttpError: HttpError accessing <
>>> https://dataflow.googleapis.com/v1b3/projects/careem-mktg-dwh/locations/us-central1/jobs?alt=json>:
>>> response: <{'status': '400', 'content-length': '145', 'x-xss-protection':
>>> '1; mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding':
>>> 'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF',
>>> '-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Wed, 10
>>> Jan 2018 22:49:32 GMT', 'x-frame-options': 'SAMEORIGIN', 'alt-svc':
>>> 'hq=":443"; ma=2592000; quic=51303431; quic=51303339; quic=51303338;
>>> quic=51303337; quic=51303335,quic=":443"; ma=2592000; v="41,39,38,37,35"',
>>> 'content-type': 'application/json; charset=UTF-8'}>, content <{
>>> "error": {
>>> "code": 400,
>>> "message": "Request payload size exceeds the limit: 10485760 bytes.",
>>> "status": "INVALID_ARGUMENT"
>>> }
>>>
>>> ```
>>>
>>>
>>> In short, this is what I’m doing
>>> 1 - Reading data from bigquery table using
>>> ```beam.io.BigQuerySource ```
>>> 2 - Partitioning each days using
>>> ``` beam.Partition ```
>>> 3- Applying transforms each partition and combining some output
>>> P-Collections.
>>> 4- After the transforms, the results are saved to a biqquery date
>>> partitioned table.
>>>
>>
>

Reply via email to