Are the files in some special format that you need to parse and understand?
Or could you opt to store the schemas as proto descriptors or Avro avsc?

On Fri, Jun 18, 2021 at 10:40 AM Matthew Ouyang <[email protected]>
wrote:

> Hello Brian.  Thank you for the clarification request.  I meant the first
> case.  I have files that define field names and types.
>
> On Fri, Jun 18, 2021 at 12:12 PM Brian Hulette <[email protected]>
> wrote:
>
>> Could you clarify what you mean? I could interpret this two different
>> ways:
>> 1) Have a separate file that defines the literal schema (field names and
>> types).
>> 2) Infer a schema from data stored in some file in a structurerd format
>> (e.g csv or parquet).
>>
>> For (1) Reuven's suggestion would work. You could also use an Avro avsc
>> file here, which we also support.
>> For (2) we don't have anything like this in the Java SDK. In the Python
>> SDK the DataFrame API can do this though. When you use one of the pandas
>> sources with the Beam DataFrame API [1] we peek at the file and infer the
>> schema so you don't need to specify it. You'd just need to use
>> to_pcollection to convert the dataframe to a schema-aware PCollection.
>>
>> Brian
>>
>> [1]
>> https://beam.apache.org/releases/pydoc/current/apache_beam.dataframe.io.html
>> [2]
>> https://beam.apache.org/releases/pydoc/2.30.0/apache_beam.dataframe.convert.html#apache_beam.dataframe.convert.to_pcollection
>>
>> On Fri, Jun 18, 2021 at 7:50 AM Reuven Lax <[email protected]> wrote:
>>
>>> There is a proto format for Beam schemas. You could define it as a proto
>>> in a file and then parse it.
>>>
>>> On Fri, Jun 18, 2021 at 7:28 AM Matthew Ouyang <[email protected]>
>>> wrote:
>>>
>>>> I was wondering if there were any tools that would allow me to build a
>>>> Beam schema from a file?  I looked for it in the SDK but I couldn't find
>>>> anything that could do it.
>>>>
>>>

Reply via email to