Ping again. Any chance someone takes a look to get this thing going? It's
just a design doc and basic metadata/IO impl. We're not talking about
actual source/sink code yet (already done but saved for future PRs).
On Fri, Jun 21, 2019 at 1:38 PM Ahmet Altay wrote:
> Thank you Claire, this looks p
cer side, by pre-grouping/sorting
data and writing to bucket/shard output files, the consumer can sort/merge
matching ones without a CoGBK. Essentially we're paying the shuffle cost
upfront to avoid them repeatedly in each consumer pipeline that wants to
join data.
> Thanks,
> Cham
&g
a way to move forward.
On Thu, Jun 27, 2019 at 4:39 PM Neville Li wrote:
> Thanks. I responded to comments in the doc. More inline.
>
> On Thu, Jun 27, 2019 at 2:44 PM Chamikara Jayalath
> wrote:
>
>> Thanks added few comments.
>>
>> If I understood correctly, y
such a major piece of work I don't
> want it to sit with everyone thinking they are waiting on someone else, or
> any such thing. (not saying this is happening, just pinging to be sure)
>
> Kenn
>
> On Mon, Jul 1, 2019 at 1:09 PM Neville Li wrote:
>
>> Updated t
is
>> promised to be in key order)) or support a single SMB aka
>> "PreGrouping" source/sink pair that's aways used together (and whose
>> underlying format is not necessarily public).
>>
>> On Sat, Jul 13, 2019 at 3:19 PM Neville Li wrote:
>> &g
ror the existing IO ones (from an API
> perspective--how much implementation it makes sense to share is an
> orthogonal issue that I'm sure can be worked out.)
>
> On Mon, Jul 15, 2019 at 4:18 PM Neville Li wrote:
> >
> > Hi Robert,
> >
> > I agree, it'd
a lot of classes that are
> nested (non-static) or non-public. I can understand why they were made
> non-public, it's a hard abstraction to design well and keep compatibility.
> As Neville mentioned, decoupling readers and writers would not only benefit
> for this propo
e of the
> logic (for example compression, temp file handling) that is already
> implemented in Beam FileIO/WriteFiles transforms in your SMB sink transform.
> >> >>>
> >> >>> For reader, you are right that there's no FileIO.Read. What we have
> are
ts across files within a bucket and
TBH I'm not even sure where to start.
I'll file separate PRs for core changes needed for discussion. WDYT?
On Mon, Jul 22, 2019 at 4:20 AM Robert Bradshaw wrote:
> On Fri, Jul 19, 2019 at 5:16 PM Neville Li wrote:
> >
> > For
Kirpichov
>> wrote:
>> >
>> > On Mon, Jul 22, 2019 at 7:49 AM Robert Bradshaw
>> wrote:
>> >>
>> >> On Mon, Jul 22, 2019 at 4:04 PM Neville Li
>> wrote:
>> >> >
>> >> > Thanks Robert. Agree with the FileIO p
3, 2019 at 6:36 PM Neville Li wrote:
> So I spent one afternoon trying some ideas for reusing the last few
> transforms WriteFiles.
>
> WriteShardsIntoTempFilesFn extends DoFn*,
> Iterable>, *FileResult*>
> => GatherResults extends PTransform,
> PCollection>>
>
Hi all,
Part 2 is out:
https://labs.spotify.com/2017/10/23/big-data-processing-at-spotify-the-road-to-scio-part-2/
We also have a meetup in Stockholm later today:
https://www.meetup.com/stockholm-hug/events/244112281/
On Mon, Oct 23, 2017 at 3:03 PM Ismaël Mejía wrote:
> Has anybody thought ab
Hi all,
We just released Scio 0.5.0-alpha2. This is mostly a bug fix release. We'll
probably have one or 2 beta releases with the upcoming Beam 2.3.0. Stay
tuned!
Cheers,
Neville
https://github.com/spotify/scio/releases/tag/v0.5.0-alpha2
Breaking changes
- BigQueryIO in JobTest#output now r
I don't see a beam-sdks-java-io-hadoop-input-format artifact in the staging
repo, but the Maven module still exists:
https://github.com/apache/beam/tree/v2.3.0-RC3/sdks/java/io/hadoop-input-format
Was it not published by mistake? We still have code that depends on this.
On Mon, Feb 12, 2018 at 3:
14 matches
Mail list logo