Re: pipeline steps

2019-02-09 Thread Reuven Lax
I think we could definitely add an option to FileIO to add the filename to
every record. It would come at a (performance) cost - often the filename is
much larger than the actual record..

On Thu, Feb 7, 2019 at 6:29 AM Kenneth Knowles  wrote:

> This comes up a lot, wanting file names alongside the data that came from
> the file. It is a historical quirk that none of our connectors used to have
> the file names. What is the change needed for FileIO + parse Avro to be
> really easy to use?
>
> Kenn
>
> On Thu, Feb 7, 2019 at 6:18 AM Jeff Klukas  wrote:
>
>> I haven't needed to do this with Beam before, but I've definitely had
>> similar needs in the past. Spark, for example, provides an input_file_name
>> function that can be applied to a dataframe to add the input file as an
>> additional column. It's not clear to me how that's implemented, though.
>>
>> Perhaps others have suggestions, but I'm not aware of a way to do this
>> conveniently in Beam today. To my knowledge, today you would have to use
>> FileIO.match() and FileIO.readMatches() to get a collection of
>> ReadableFile. You'd then have to FlatMapElements to pull out the metadata
>> and the bytes of the file, and you'd be responsible for parsing those bytes
>> into avro records. You'd  be able to output something like a KV
>> that groups the file name together with the parsed avro record.
>>
>> Seems like something worth providing better support for in Beam itself if
>> this indeed doesn't already exist.
>>
>> On Thu, Feb 7, 2019 at 7:29 AM Chaim Turkel  wrote:
>>
>>> Hi,
>>>   I am working on a pipeline that listens to a topic on pubsub to get
>>> files that have changes in the storage. Then i read avro files, and
>>> would like to write them to bigquery based on the file name (to
>>> different tables).
>>>   My problem is that the transformer that reads the avro does not give
>>> me back the files name (like a tuple or something like that). I seem
>>> to have this pattern come back a lot.
>>> Can you think of any solutions?
>>>
>>> Chaim
>>>
>>> --
>>>
>>>
>>> Loans are funded by
>>> FinWise Bank, a Utah-chartered bank located in Sandy,
>>> Utah, member FDIC, Equal
>>> Opportunity Lender. Merchant Cash Advances are
>>> made by Behalf. For more
>>> information on ECOA, click here
>>> . For important information about
>>> opening a new
>>> account, review Patriot Act procedures here
>>> .
>>> Visit Legal
>>>  to
>>> review our comprehensive program terms,
>>> conditions, and disclosures.
>>>
>>


Re: Another New Contributor

2019-02-09 Thread Griselda Cuevas
Hi Connell - I just found out about Apache Kibble a project made to track
community metrics. I will check this when I come back from vacation and
start checking the progress Beam has had.

Thanks!
G


Gris Cuevas Zambrano

Open Source Strategist

+1 (650) 772-2947

345 Spear Street, San Francisco, 94105




On Sat, 9 Feb 2019 at 14:16, Thomas Weise  wrote:

> Hi Tanay,
>
> Welcome to the project! I just added you as contributor in JIRA.
>
> Thomas
>
> On Sat, Feb 9, 2019 at 11:12 AM Tanay Tummapalli 
> wrote:
>
>> Hey Everyone
>>
>> I am a computer science student from Delhi. I am currently in my senior
>> year at Maharaja Surajmal Institute of Technology, Delhi. I just got done
>> with my internship at SocialCops - an Indian Data Intelligence Startup. I
>> had to build a streaming pipeline there as part of my internship. While
>> studying for it, I read about Apache Beam in the Streaming Systems book
>> written by Tyler Akidau, Reuven Lax and Slava Chernyak.
>>
>> I want to contribute to Open Source this summer as part of Google Summer
>> of Code, or even otherwise. Apache Beam is an exciting project!
>>
>> Can you please add me to Apache Beam JIRA? I want to start contributing.
>> My ASF JIRA username is ttanay.
>>
>> Cheers!
>> Tanay Tummalapalli
>>
>>


Re: Another New Contributor

2019-02-09 Thread Thomas Weise
Hi Tanay,

Welcome to the project! I just added you as contributor in JIRA.

Thomas

On Sat, Feb 9, 2019 at 11:12 AM Tanay Tummapalli 
wrote:

> Hey Everyone
>
> I am a computer science student from Delhi. I am currently in my senior
> year at Maharaja Surajmal Institute of Technology, Delhi. I just got done
> with my internship at SocialCops - an Indian Data Intelligence Startup. I
> had to build a streaming pipeline there as part of my internship. While
> studying for it, I read about Apache Beam in the Streaming Systems book
> written by Tyler Akidau, Reuven Lax and Slava Chernyak.
>
> I want to contribute to Open Source this summer as part of Google Summer
> of Code, or even otherwise. Apache Beam is an exciting project!
>
> Can you please add me to Apache Beam JIRA? I want to start contributing.
> My ASF JIRA username is ttanay.
>
> Cheers!
> Tanay Tummalapalli
>
>


Another New Contributor

2019-02-09 Thread Tanay Tummapalli
Hey Everyone

I am a computer science student from Delhi. I am currently in my senior
year at Maharaja Surajmal Institute of Technology, Delhi. I just got done
with my internship at SocialCops - an Indian Data Intelligence Startup. I
had to build a streaming pipeline there as part of my internship. While
studying for it, I read about Apache Beam in the Streaming Systems book
written by Tyler Akidau, Reuven Lax and Slava Chernyak.

I want to contribute to Open Source this summer as part of Google Summer of
Code, or even otherwise. Apache Beam is an exciting project!

Can you please add me to Apache Beam JIRA? I want to start contributing.
My ASF JIRA username is ttanay.

Cheers!
Tanay Tummalapalli


Re: BEAM-6639. ClickHouseIOTest flakey failure failing in precomiits

2019-02-09 Thread Gleb Kanterov
I'm looking into it, it seems that previous mitigation didn't help. I added
extra logging and going to try to reproduce flakey failure again. Sorry for
the inconvenience, I've never experienced such problems with testcontainers
before.

On Sat, Feb 9, 2019 at 12:36 AM Alex Amato  wrote:

> https://issues.apache.org/jira/browse/BEAM-6639
>
> Noticed this failure in precommits,
>
>
> https://builds.apache.org/job/beam_PreCommit_Java_Commit/4166/testReport/junit/org.apache.beam.sdk.io.clickhouse/ClickHouseIOTest/classMethod/
>
> Any ideas what's going on with this?
>
>
>

-- 
Cheers,
Gleb