Hi All,
I have a use case where I have kafka and flat files so can I write one code
and run for both or I have to create two different pipelines or use
pipeline join in a one pipeline.
Which one is better?
Best Regards,
ANKIT BEOHAR
Amit
Thanks for your fast response now I got it, my use case will solve using
composite transforms (which is in progress I guess).
But if I twist my logic and put in a way you mentioned to just use
different I/O and run on top of SPARK then I guess BEAM will handle batch
and streaming performance
+1 to making the IO metrics (e.g. producers, consumers) available as part
of the Beam pipeline metrics tree for debugging and visibility.
As it has already been mentioned, many IO clients have a metrics mechanism
in place, so in these cases I think it could be beneficial to mirror their
metrics
You can write one pipeline and simply replace the IO, for example:
To read from (text) files you can use:
*PCollection lines =
p.apply(TextIO.Read.from("file://some/inputData.txt"));*
and from Kafka (I'm adding a generic key here because Kafka messages are
keyed):
*PCollection>
Oh, missed your question on which one is better it really depends on
your use case.
If the data is homogenous, and you want to write to the same IO, I don't
see a reason not to Flatten them into one PCollection.
If you want to write files-to-files and Kafka-to-Kafka you might be better
off
Hello there!
I am working on writing a Read IO for Hadoop InputFormat. This will enable
reading from any datasource which supports Hadoop InputFormat, i.e. provides
source to read from InputFormat for integration with Hadoop.
It makes sense for the HadoopInputFormatIO to share some code with the
I skimmed through HdfsIO and I think it is essentially HahdoopInpuFormatIO
with FileInputFormat. I would pretty much move most of the code to
HadoopInputFormatIO (just make HdfsIO a specific instance of HIF_IO).
On Wed, Feb 15, 2017 at 9:15 AM, Dipti Kulkarni <
dipti_dkulka...@persistent.com>
On Jenkins it's possible to run several jobs in the same time but on different
executor. That's the easiest way.
Regards
JB
On Feb 15, 2017, 10:15, at 10:15, "Ismaël Mejía" wrote:
>This question got lost in the discussion, but there is a small
>improvement
>that we can do:
>
This question got lost in the discussion, but there is a small improvement
that we can do:
> Just to check, are we doing parallel builds?
We are on jenkins, not in travis, there is an ongoing PR to fix this.
What we can improve is to check if we can run some of the test suites in
parallel to