[jira] [Commented] (BEAM-3342) Create a Cloud Bigtable Python connector

2018-09-12 Thread Luke Cwik (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612781#comment-16612781
 ] 

Luke Cwik commented on BEAM-3342:
-

Note that I closed the duplicate of this issue but wanted to capture the design 
doc proposal form that JIRA: 
[https://docs.google.com/document/d/1iXeQvIAsGjp9orleDy0o5ExU-eMqWesgvtt231UoaPg/edit?usp=sharing]

> Create a Cloud Bigtable Python connector
> 
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: Major
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3342) Create a Cloud Bigtable Python connector

2018-08-21 Thread Adam Lugowski (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587953#comment-16587953
 ] 

Adam Lugowski commented on BEAM-3342:
-

I'm happy to help out if you need a hand.

> Create a Cloud Bigtable Python connector
> 
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: Major
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3342) Create a Cloud Bigtable Python connector

2018-08-21 Thread Adam Lugowski (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587951#comment-16587951
 ] 

Adam Lugowski commented on BEAM-3342:
-

Glad to hear we're getting close!

It looks like the Java implementation of `BoundedSource`, 
[https://github.com/apache/beam/blob/release-2.6.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java#L786]
 , is just based on sampled row keys which is also available in the Python 
client under `table.sample_row_keys()`. Both Python and Java clients talk to 
the same API, don't they?

> Create a Cloud Bigtable Python connector
> 
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: Major
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3342) Create a Cloud Bigtable Python connector

2018-08-21 Thread Solomon Duskis (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587393#comment-16587393
 ] 

Solomon Duskis commented on BEAM-3342:
--

The Cloud Bigtable client is just about ready with full functionality.  It did 
indeed take longer that we were expected.  Once we do that, there's a 
likelihood that a Python write connector will significantly underperform 
compared to Java, since the Python client only performs synchronous operations, 
where the Java has a high throughput asynchronous writer.

Also, in terms of reading from Cloud Bigtable, any connector needs full support 
for a BoundedSource, or something like it.  We could not figure out how to make 
BoundedSource work in Python.

> Create a Cloud Bigtable Python connector
> 
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: Major
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3342) Create a Cloud Bigtable Python connector

2018-08-18 Thread Adam Lugowski (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584948#comment-16584948
 ] 

Adam Lugowski commented on BEAM-3342:
-

What issues are you facing?

I did a quick glance at the Datastore connector and it seems like everything it 
does is already supported by Google's BigTable client.

I ask because my workloads would be much easier to write in Python than Java.

> Create a Cloud Bigtable Python connector
> 
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: Major
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3342) Create a Cloud Bigtable Python connector

2018-01-23 Thread Solomon Duskis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336429#comment-16336429
 ] 

Solomon Duskis commented on BEAM-3342:
--

It turns out that we have quite a bit of work to do on the core Cloud Bigtable 
python client in order to make an effective Beam connector.  It could be a 
while before the client is ready.  

> Create a Cloud Bigtable Python connector
> 
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: Major
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-3342) Create a Cloud Bigtable Python connector

2017-12-13 Thread Chamikara Jayalath (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289672#comment-16289672
 ] 

Chamikara Jayalath commented on BEAM-3342:
--

Great to hear that you are adding a Python connector.

Is this the correct version of the client: 
https://pypi.python.org/pypi/google-cloud-bigtable/0.28.1
If so probably you just need to add a dependency to 
https://github.com/apache/beam/blob/master/sdks/python/setup.py#L123.

For BigQuery and GCS we use client libraries generated by apitools (not exactly 
sure why we did that but possibly due to stable clients not being available at 
that time).

> Create a Cloud Bigtable Python connector
> 
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Ahmet Altay
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3342) Create a Cloud Bigtable Python connector

2017-12-13 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289658#comment-16289658
 ] 

Ahmet Altay commented on BEAM-3342:
---

cc: [~chamikara]

> Create a Cloud Bigtable Python connector
> 
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Ahmet Altay
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-3342) Create a Cloud Bigtable Python connector

2017-12-13 Thread Solomon Duskis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289655#comment-16289655
 ] 

Solomon Duskis commented on BEAM-3342:
--

I started with a simple pipeline that writes to Cloud Bigtable via the 
google.cloud bigtable package, which works locally with google.cloud installed, 
but doesn't work when I use a dataflow runner.  Here's what I get:

==
  message:  "Not processing workitem 2633526545277283048 since a deferred 
exception was found: Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", 
line 706, in run
self._load_main_session(self.local_staging_directory)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", 
line 446, in _load_main_session
pickler.load_session(session_file)
  File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 
247, in load_session
return dill.load_session(file_path)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 363, in 
load_session
module = unpickler.load()
  File "/usr/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
  File "/usr/lib/python2.7/pickle.py", line 1133, in load_reduce
value = func(*args)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 767, in 
_import_module
return getattr(__import__(module, None, None, [obj]), obj)
AttributeError: 'module' object has no attribute 'bigtable'
==

Can I use the standard google.cloud bigtable client?  If so, how, and why don't 
BigQuery and Storage use the google.cloud clients?

> Create a Cloud Bigtable Python connector
> 
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Ahmet Altay
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)