Hello, my 2 cents (and not sure if it makes sense for your usecase) What about the python process read from BigTable and store in a bucket as csv? Then you can read the csv from java>?
hth marco On Fri, Jan 7, 2022 at 7:31 AM Chamikara Jayalath <[email protected]> wrote: > Irrespective of whether the Java transform is defined by a user or > available in Beam Java SDK, the APIs for using such a transform from Python > are the same. > In other words, there's no special support for using arbitrary Java > transforms in Beam from Python pipelines. We have to use the API mentioned > in the documentation I linked above to use Java transforms from Python in > either case. > > To set expectations correctly, using a complex Java IO connector transform > such as BigTableIO.Read from Python can be a bit involved. For example, > (1) We have to make sure that options needed to instantiate the transform > (for example, BigTableOptions) can be correctly instantiated on the Python > side. > (2) Seems like Bigtable read transform currently has output type > "com.google.bigtable.v2.Row". This has to be mapped to a cross-language > compatible type so that Python can understand it (for example, Beam Rows). > > Thanks, > Cham > > > > > > > > > On Thu, Jan 6, 2022 at 10:32 PM Sayak Paul <[email protected]> wrote: > >> My question still remains same. I am not yet sure how to use an existing >> Java transform (like BigTable IO reader in Java) from a Python pipeline. >> The examples take a user-defined sample transform and then show their >> usage. >> >> On Fri, 7 Jan, 2022, 11:10 Chamikara Jayalath, <[email protected]> >> wrote: >> >>> Actually this is the correct link for multi-language Python >>> documentation: >>> https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines >>> We also have a quickstart guide which might be a better starting point: >>> https://beam.apache.org/documentation/sdks/python-multi-language-pipelines/ >>> >>> We haven't looked into developing a cross-language wrapper for the Java >>> BigTable connector yet. I created >>> https://issues.apache.org/jira/browse/BEAM-13607 for tracking this. >>> It's great if you can contribute to this. >>> >>> Thanks, >>> Cham >>> >>> >>> On Thu, Jan 6, 2022 at 8:35 PM Sayak Paul <[email protected]> wrote: >>> >>>> Luke, I studied the resources you provided. However, it's still a >>>> little unclear to me as to how I could use the BigTableIO >>>> <https://beam.apache.org/releases/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.html> >>>> in >>>> Java from a Python pipeline. The examples and documentation first implement >>>> a demo class in Java and then show how to use it. >>>> >>>> I was wondering if there was a guide on using the existing connectors >>>> (i.e., without defining them first) from Python pipelines. I am probably >>>> mistaken somewhere so happy to rectify myself if that's the case. >>>> >>>> Sayak Paul | sayak.dev >>>> >>>> >>>> >>>> On Thu, Jan 6, 2022 at 10:35 PM Sayak Paul <[email protected]> >>>> wrote: >>>> >>>>> Thanks! >>>>> >>>>> On Thu, 6 Jan, 2022, 22:27 Luke Cwik, <[email protected]> wrote: >>>>> >>>>>> +1 on using cross language to get the Java Bigtable connector that >>>>>> already exists. >>>>>> >>>>>> You could also take a look at this other xlang documentation[1] and >>>>>> look at an existing implementation such as kafka[2] that is xlang. >>>>>> >>>>>> Finally there was support added to use many transforms in Java using >>>>>> the class name and builder methods[3]. >>>>>> >>>>>> 1: https://beam.apache.org/documentation/patterns/cross-language/ >>>>>> 2: >>>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/kafka.py >>>>>> 3: https://issues.apache.org/jira/browse/BEAM-12769 >>>>>> >>>>>> >>>>>> On Thu, Jan 6, 2022 at 4:41 AM Sayak Paul <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi folks, >>>>>>> >>>>>>> My project needs reading data from Cloud BigTable. We are aware that >>>>>>> an IO connector for BigTable is available in the Java SDK. So we could >>>>>>> probably make use of the cross-language capabilities >>>>>>> <https://beam.apache.org/documentation/programming-guide/#1311-creating-cross-language-java-transforms> >>>>>>> of Beam and make it work. I am, however, looking for >>>>>>> guidance/resources/pointers that could be beneficial to build a Beam >>>>>>> pipeline in Python that reads data from Cloud BigTable. Any relevant >>>>>>> clue >>>>>>> would be greatly appreciated. >>>>>>> >>>>>>> Sayak Paul | sayak.dev >>>>>>> >>>>>>>
