Hi Jeff,

Using Java.

Yeah - we are issuing a query rather than reading a table. Materializing
the results myself and reading them back seems simple enough. I will give
that a try!

Thanks,
Matt

On Thu, Jul 2, 2020 at 9:42 AM Jeff Klukas <[email protected]> wrote:

> It sounds like your pipeline is issuing a query rather than reading a
> whole table.
>
> Are you using Java or Python? I'm only familiar with the Java SDK so my
> answer may be Java-biased.
>
> I would recommend materializing the query results to a table, and then
> configuring your pipeline to read that table rather than reading from a
> query. In that case, no query job is involved so you incur no query cost.
>
> By default, the read from a table will do an export to avro files. There
> is no GCP cost associated with that export, but there is a quota involved,
> which you may run into if you run your pipeline repeatedly. So an even
> better loop would be to do the export to GCS out of band, and then
> reference those avro files. But that would require much more extensive code
> changes in your pipeline whereas the switch of reading from a query to
> reading from a table is a one-line code change.
>
> You can also avoid the export to avro files by configuring BigQueryIO to
> use direct reads from your temporary table rather than file exports. There
> is a cost associated with direct reads, but it should generally be much
> smaller than the cost of repeatedly running a query.
>
> On Thu, Jul 2, 2020 at 9:28 AM Matt Terwilliger <
> [email protected]> wrote:
>
>> Hello,
>>
>> I'm writing a Beam pipeline that does some relatively expensive reads
>> from BigQuery. I want to be able to run the pipeline in a development loop
>> without racking up a huge bill.
>>
>> I know BigQuery has support for query caching, but from the docs, that
>> only works if you don't specify a destination table.
>>
>> For the purposes of development, I don't mind trading off stale data
>> (i.e. reusing an existing destination table if it exists) to save money.
>>
>> Is there any way to do this now, or relevant any open issues? I did a
>> quick pass through JIRA but couldn't find anything.
>>
>> Thanks,
>> Matt
>>
>

Reply via email to