[jira] [Commented] (BEAM-3342) Create a Cloud Bigtable Python connector
[ https://issues.apache.org/jira/browse/BEAM-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612781#comment-16612781 ] Luke Cwik commented on BEAM-3342: - Note that I closed the duplicate of this issue but wanted to capture the design doc proposal form that JIRA: [https://docs.google.com/document/d/1iXeQvIAsGjp9orleDy0o5ExU-eMqWesgvtt231UoaPg/edit?usp=sharing] > Create a Cloud Bigtable Python connector > > > Key: BEAM-3342 > URL: https://issues.apache.org/jira/browse/BEAM-3342 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Solomon Duskis >Assignee: Solomon Duskis >Priority: Major > > I would like to create a Cloud Bigtable python connector. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3342) Create a Cloud Bigtable Python connector
[ https://issues.apache.org/jira/browse/BEAM-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587953#comment-16587953 ] Adam Lugowski commented on BEAM-3342: - I'm happy to help out if you need a hand. > Create a Cloud Bigtable Python connector > > > Key: BEAM-3342 > URL: https://issues.apache.org/jira/browse/BEAM-3342 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Solomon Duskis >Assignee: Solomon Duskis >Priority: Major > > I would like to create a Cloud Bigtable python connector. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3342) Create a Cloud Bigtable Python connector
[ https://issues.apache.org/jira/browse/BEAM-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587951#comment-16587951 ] Adam Lugowski commented on BEAM-3342: - Glad to hear we're getting close! It looks like the Java implementation of `BoundedSource`, [https://github.com/apache/beam/blob/release-2.6.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java#L786] , is just based on sampled row keys which is also available in the Python client under `table.sample_row_keys()`. Both Python and Java clients talk to the same API, don't they? > Create a Cloud Bigtable Python connector > > > Key: BEAM-3342 > URL: https://issues.apache.org/jira/browse/BEAM-3342 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Solomon Duskis >Assignee: Solomon Duskis >Priority: Major > > I would like to create a Cloud Bigtable python connector. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3342) Create a Cloud Bigtable Python connector
[ https://issues.apache.org/jira/browse/BEAM-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587393#comment-16587393 ] Solomon Duskis commented on BEAM-3342: -- The Cloud Bigtable client is just about ready with full functionality. It did indeed take longer that we were expected. Once we do that, there's a likelihood that a Python write connector will significantly underperform compared to Java, since the Python client only performs synchronous operations, where the Java has a high throughput asynchronous writer. Also, in terms of reading from Cloud Bigtable, any connector needs full support for a BoundedSource, or something like it. We could not figure out how to make BoundedSource work in Python. > Create a Cloud Bigtable Python connector > > > Key: BEAM-3342 > URL: https://issues.apache.org/jira/browse/BEAM-3342 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Solomon Duskis >Assignee: Solomon Duskis >Priority: Major > > I would like to create a Cloud Bigtable python connector. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3342) Create a Cloud Bigtable Python connector
[ https://issues.apache.org/jira/browse/BEAM-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584948#comment-16584948 ] Adam Lugowski commented on BEAM-3342: - What issues are you facing? I did a quick glance at the Datastore connector and it seems like everything it does is already supported by Google's BigTable client. I ask because my workloads would be much easier to write in Python than Java. > Create a Cloud Bigtable Python connector > > > Key: BEAM-3342 > URL: https://issues.apache.org/jira/browse/BEAM-3342 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Solomon Duskis >Assignee: Solomon Duskis >Priority: Major > > I would like to create a Cloud Bigtable python connector. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3342) Create a Cloud Bigtable Python connector
[ https://issues.apache.org/jira/browse/BEAM-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336429#comment-16336429 ] Solomon Duskis commented on BEAM-3342: -- It turns out that we have quite a bit of work to do on the core Cloud Bigtable python client in order to make an effective Beam connector. It could be a while before the client is ready. > Create a Cloud Bigtable Python connector > > > Key: BEAM-3342 > URL: https://issues.apache.org/jira/browse/BEAM-3342 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Solomon Duskis >Assignee: Solomon Duskis >Priority: Major > > I would like to create a Cloud Bigtable python connector. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3342) Create a Cloud Bigtable Python connector
[ https://issues.apache.org/jira/browse/BEAM-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289672#comment-16289672 ] Chamikara Jayalath commented on BEAM-3342: -- Great to hear that you are adding a Python connector. Is this the correct version of the client: https://pypi.python.org/pypi/google-cloud-bigtable/0.28.1 If so probably you just need to add a dependency to https://github.com/apache/beam/blob/master/sdks/python/setup.py#L123. For BigQuery and GCS we use client libraries generated by apitools (not exactly sure why we did that but possibly due to stable clients not being available at that time). > Create a Cloud Bigtable Python connector > > > Key: BEAM-3342 > URL: https://issues.apache.org/jira/browse/BEAM-3342 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Solomon Duskis >Assignee: Ahmet Altay > > I would like to create a Cloud Bigtable python connector. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (BEAM-3342) Create a Cloud Bigtable Python connector
[ https://issues.apache.org/jira/browse/BEAM-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289658#comment-16289658 ] Ahmet Altay commented on BEAM-3342: --- cc: [~chamikara] > Create a Cloud Bigtable Python connector > > > Key: BEAM-3342 > URL: https://issues.apache.org/jira/browse/BEAM-3342 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Solomon Duskis >Assignee: Ahmet Altay > > I would like to create a Cloud Bigtable python connector. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (BEAM-3342) Create a Cloud Bigtable Python connector
[ https://issues.apache.org/jira/browse/BEAM-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289655#comment-16289655 ] Solomon Duskis commented on BEAM-3342: -- I started with a simple pipeline that writes to Cloud Bigtable via the google.cloud bigtable package, which works locally with google.cloud installed, but doesn't work when I use a dataflow runner. Here's what I get: == message: "Not processing workitem 2633526545277283048 since a deferred exception was found: Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 706, in run self._load_main_session(self.local_staging_directory) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 446, in _load_main_session pickler.load_session(session_file) File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 247, in load_session return dill.load_session(file_path) File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 363, in load_session module = unpickler.load() File "/usr/lib/python2.7/pickle.py", line 858, in load dispatch[key](self) File "/usr/lib/python2.7/pickle.py", line 1133, in load_reduce value = func(*args) File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 767, in _import_module return getattr(__import__(module, None, None, [obj]), obj) AttributeError: 'module' object has no attribute 'bigtable' == Can I use the standard google.cloud bigtable client? If so, how, and why don't BigQuery and Storage use the google.cloud clients? > Create a Cloud Bigtable Python connector > > > Key: BEAM-3342 > URL: https://issues.apache.org/jira/browse/BEAM-3342 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Solomon Duskis >Assignee: Ahmet Altay > > I would like to create a Cloud Bigtable python connector. -- This message was sent by Atlassian JIRA (v6.4.14#64029)