Is it because of my output I'm participating writing to 180 partitions? Or
because of more pipeline operations & transforms

On Thu, Jan 11, 2018 at 10:48 AM, Chamikara Jayalath <[email protected]>
wrote:

> Dataflow service has a 10MB request size limit. Seems like you are hitting
> this. See following for more information regarding this.
> https://cloud.google.com/dataflow/pipelines/troubleshooting-your-pipeline
>
> Looks like your are hitting this due to number of partitions. I don't
> think currently there's a good solution other than to execute multiple
> jobs. We hope to introduce dynamic destinations feature to Python BQ sink
> in the near future which will allow you to write this using a more compact
> pipeline.
>
> Thanks,
> Cham
>
>
> On Wed, Jan 10, 2018 at 10:22 PM Unais Thachuparambil <
> [email protected]> wrote:
>
>> I wrote a python dataflow job to read data from biqquery and do some
>> transform and save the result as bq table..
>>
>> I tested with 8 days data it works fine - when I scaled to 180 days I’m
>> getting the below error
>>
>> ```"message": "Request payload size exceeds the limit: 10485760
>> bytes.",```
>>
>>
>> ```pitools.base.py.exceptions.HttpError: HttpError accessing <
>> https://dataflow.googleapis.com/v1b3/projects/careem-mktg-
>> dwh/locations/us-central1/jobs?alt=json>: response: <{'status': '400',
>> 'content-length': '145', 'x-xss-protection': '1; mode=block',
>> 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked',
>> 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', '-content-encoding':
>> 'gzip', 'cache-control': 'private', 'date': 'Wed, 10 Jan 2018 22:49:32
>> GMT', 'x-frame-options': 'SAMEORIGIN', 'alt-svc': 'hq=":443"; ma=2592000;
>> quic=51303431; quic=51303339; quic=51303338; quic=51303337;
>> quic=51303335,quic=":443"; ma=2592000; v="41,39,38,37,35"', 'content-type':
>> 'application/json; charset=UTF-8'}>, content <{
>> "error": {
>> "code": 400,
>> "message": "Request payload size exceeds the limit: 10485760 bytes.",
>> "status": "INVALID_ARGUMENT"
>> }
>>
>> ```
>>
>>
>> In short, this is what I’m doing
>> 1 - Reading data from bigquery table using
>> ```beam.io.BigQuerySource ```
>> 2 - Partitioning each days using
>> ``` beam.Partition ```
>> 3- Applying transforms each partition and combining some output
>> P-Collections.
>> 4- After the transforms, the results are saved to a biqquery date
>> partitioned table.
>>
>

Reply via email to