Re: Side Inputs size

2019-04-08 Thread Lukasz Cwik
Side input performance and scaling is runner dependent. Runners should attempt to provide support for efficient random access lookup in the maps. Side inputs should also be cached across elements if the map hasn't changed which runners should also be capable of doing. So yes, side input size can i

Re: Implementation an S3 file system for python SDK - Updated

2019-04-08 Thread Chamikara Jayalath
Thanks for the proposal Pasan. Added some comments. As others mentioned, FileSystem interface is orthogonal to SDF (storage system instead of source format) so no need to wait for SDF. - Cham On Mon, Apr 8, 2019 at 10:57 AM Lukasz Cwik wrote: > A filesystem is a lower level abstraction that a

Re: Implementation an S3 file system for python SDK - Updated

2019-04-08 Thread Lukasz Cwik
A filesystem is a lower level abstraction that a PTransform can use thus there is no need to consider SDF when creating the S3 filesytem. If we were redesigning the interface to all filesystems, then SDF should be considered. On Mon, Apr 8, 2019 at 10:54 AM Lara Schmidt wrote: > I'd push towards

Side Inputs size

2019-04-08 Thread augusto . mcc
Hi, In one of my transforms I am using Map which is the result of a previous transform as a sideInput. This Map is potentially very large with count of all words that appeared in all documents. The step that uses the sideInput is quite slow because it seems like it is initialising a huge Has

Re: Implementation an S3 file system for python SDK - Updated

2019-04-08 Thread Pablo Estrada
Currently, Pasan is working on a design for adding a couple implementations to the Filesystem interface in Python, and it's not necessary to consider SDF here. IMHO. On the other hand, Python's fileio[1] could probably use SDF-based improvements to split when many files are being matched. Best -P.

Re: Is there an integration test available for filesystem checking

2019-04-08 Thread Pablo Estrada
I recommend you send these questions to the dev@ list Pasan. Have you looked at the *_test.py files corresponding to each one of the file systems? Are they all mocking their access to GCS? Best -P. On Sun, Apr 7, 2019 at 11:12 PM Pasan Kamburugamuwa < pasankamburugamu...@gmail.com> wrote: > Hell

Re: Implementation an S3 file system for python SDK - Updated

2019-04-08 Thread Ahmet Altay
+dev +Pablo Estrada +Chamikara Jayalath +Udi Meiri Thank you Pasan. I quickly looked at the proposal and it looks good. Added a few folks who could offer additional feedback. On Mon, Apr 8, 2019 at 12:13 AM Pasan Kamburugamuwa < pasankamburugamu...@gmail.com> wrote: > Hi, > > I have updated

Re: Couchbase

2019-04-08 Thread Ismaël Mejía
Hello, Guobao is working on this, but he is OOO at least until end of next week so if you can wait it will be available 'soon'. If you need this urgently and you decide to write your own implementation of write, it would be a valuable contribution that I will be happy to review. Regards, Ismaël

Re: Couchbase

2019-04-08 Thread Joshua Fox
Note that the Read part has recently been developed. I need a very simply write functionality -- simply inserting JsonObjects to Couchbase. On Mon, Apr 8, 2019 at 3:13 PM Joshua Fox wrote: > I am looking for the equivalent of > org.apache.beam.sdk.io.gcp.datastore.DatastoreV1.Write for Couchba

Couchbase

2019-04-08 Thread Joshua Fox
I am looking for the equivalent of org.apache.beam.sdk.io.gcp.datastore.DatastoreV1.Write for Couchbase What is the status? From this and this , it does not seem to be in-progress.

Re: Is AvroCoder the right coder for me?

2019-04-08 Thread Augusto Ribeiro
Hi Ryan, Thanks for the input. When I last tried running my pipeline, this problem doesn't seem to be a huge bottleneck. I probably had other things that were making it worse. I still think it is weird that when you take a thread dump "snapshot" most of the methods are waiting on that lock so i

Implementation an S3 file system for python SDK - Updated

2019-04-08 Thread Pasan Kamburugamuwa
Hi, I have updated the project proposal according to the given feedback. So can you guys check my proposal again and give me your feedback about corrections I have done. Here is the link to the updated project proposal https://docs.google.com/document/d/1i_PoIrbmhNgwKCS1TYWC28A9RsyZQFsQCJic3aCXO-