Repro at https://github.com/shikhar/flink-sbt-fatjar-troubles, run `sbt
assembly`
A fat jar seems like the best way to provide jobs for Flink to execute.
I am declaring deps like:
{noformat}
"org.apache.flink" %% "flink-clients" % "1.0-SNAPSHOT" % "provided"
"org.apache.flink" %% "flink-streaming
My input file contains newline-delimited JSON records, one per text line.
The records on the Kafka topic are JSON blobs encoded to UTF8 and written
as bytes.
On Fri, Feb 12, 2016 at 1:41 PM, Martin Neumann wrote:
> I'm trying the same thing now.
>
> I guess you need to read the file as byte arra
I'm trying the same thing now.
I guess you need to read the file as byte arrays somehow to make it work.
What read function did you use? The mapper is not hard to write but the
byte array stuff gives me a headache.
cheers Martin
On Fri, Feb 12, 2016 at 9:12 PM, Nick Dimiduk wrote:
> Hi Mart
Hi Martin,
I have the same usecase. I wanted to be able to load from dumps of data in
the same format as is on the kafak queue. I created a new application main,
call it the "job" instead of the "flow". I refactored my code a bit for
building the flow so all that can be reused via factory method.
Hello,
Is there a Hive(or Spark dataframe) partitionBy equivalent in Flink?
I'm looking to save output as CSV files partitioned by two columns(date and
hour).
The partitionBy dataset API is more to partition the data based on a column
for further processing.
I'm thinking there is no direct
Hello Matthias,
Thank you very much :)
Best Regards,
Subash Basnet
On Fri, Feb 12, 2016 at 8:22 PM, Matthias J. Sax wrote:
> You might want to check out the Stratosphere project web site:
> http://stratosphere.eu/project/publications/
>
> -Matthias
>
> On 02/12/2016 05:52 PM, subash basnet wro
You might want to check out the Stratosphere project web site:
http://stratosphere.eu/project/publications/
-Matthias
On 02/12/2016 05:52 PM, subash basnet wrote:
> Hello all,
>
> I am currently doing master's thesis on Apache-flink. It would be really
> helpful to know about the reference paper
Hello all,
I am currently doing master's thesis on Apache-flink. It would be really
helpful to know about the reference papers followed for the
development/background of flink.
It would help me build a solid background knowledge to analyze flink.
Currently I am reading all the related materials f
Hello Fabian,
Thank you for the response, but I have been stuck on how to iterate over
the DataSet, perform operations and return a new modified DataSet similar
to that of list operation as shown below.
Eg: Currently I am doing the following:
for (Centroid centroid : centroids.collect()) {
for
Ah ok, I didn't know about it! Thanks Till and Fabian!
On Fri, Feb 12, 2016 at 5:11 PM, Fabian Hueske wrote:
> Hi Flavio,
>
> If I got it right, you can use a FullOuterJoin.
> It will give you both elements on a match and otherwise a left or a right
> element and null.
>
> Best, Fabian
>
> 2016-
Hi Flavio,
If I got it right, you can use a FullOuterJoin.
It will give you both elements on a match and otherwise a left or a right
element and null.
Best, Fabian
2016-02-12 16:48 GMT+01:00 Flavio Pompermaier :
> Hi to all,
>
> I have a use case where I have to merge 2 datasets but I can't fin
Why don’t you simply use a fullOuterJoin to do that?
Cheers,
Till
On Fri, Feb 12, 2016 at 4:48 PM, Flavio Pompermaier
wrote:
> Hi to all,
>
> I have a use case where I have to merge 2 datasets but I can't find a
> direct dataset API to do that.
> I want to execute some function when there's a
Hi to all,
I have a use case where I have to merge 2 datasets but I can't find a
direct dataset API to do that.
I want to execute some function when there's a match, otherwise move on the
not-null element.
At the moment I can do this in a fairly complicated way (I want to avoid
broadcasting becaus
Thanks, Stephan.
Everything is back to normal for us.
Cheers,
Cory
On Fri, Feb 12, 2016 at 6:54 AM, Stephan Ewen wrote:
> Hi Cory!
>
> We found the problem. There is a development fork of Flink for Stream SQL,
> whose CI infrastructure accidentally also deployed snapshots and overwrote
> some
Hi Cory!
We found the problem. There is a development fork of Flink for Stream SQL,
whose CI infrastructure accidentally also deployed snapshots and overwrote
some of the proper master branch snapshots.
That's why the snapshots got inconsistent. We fixed that, and newer
snapshots should be online
Hi Tanguy,
I would recommend to refer to the documentation of the specific Flink
version you are using.
This is the documentation for 1.0-SNAPSHOT:
https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/connectors/kafka.html
and this is the doc for 0.10.x:
https://ci.apache.org/pro
Hello,
I am currently trying to develop am algorithm mining frequent item sets over a
data stream.
I am using kafka to generate the stream, however I cannot manage to link Flink
to Kafka.
The code presented here is working but only using Flink version 0.9.1
https://github.com/dataArtisans/kafka
Its not only about testing, I will also need to run things against
different datasets. I want to reuse as much of the code as possible to load
the same data from a file instead of kafka.
Is there a simple way of loading the data from a File using the same
conversion classes that I would use to tra
18 matches
Mail list logo