Hi
This is the link *http://goo.gl/gVOSp8* to the slides of my talk on June 30,
2015 at the Chicago Apache Flink meetup.
Although most of the current buzz is about Apache Spark, the talk shows how
Apache Flink offers the only hybrid open source (Real-Time Streaming +
Batch) distributed data proc
Dear Aljoscha
Thanks for your response.
Done for IntelliJ.
Regards
Liang
发件人: Aljoscha Krettek [mailto:aljos...@apache.org]
发送时间: 2015年7月6日 15:58
收件人: user@flink.apache.org
主题: Re: In windows8 + VitualBox, how to build Flink development environment?
Hi,
most Flink developers use IntelliJ so it i
Hi Stephan
Yes. You are correct. It looks like the TPCx-HS is an industry standard
for big data. But how to get a Flink number on that.
I think it is also difficult to get a Spark performance number based on
TPCx-HS.
if you know someone can provide servers for performance testing. I would
like t
Hi
Vasia, thanks for sharing.
1. I would like to add a couple resources about *BigBench*, the Big Data
benchmark suite that you are referring to:
https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench
and also
http://blog.cloudera.com/blog/2014/11/bigbench-toward-an-industry-standar
Hi,
Apart from the amplab benchmark, you might also find [1] and [2]
interesting. The first is a survey on existing benchmarks, while the second
proposes one. However, they are also limited to SQL-like queries.
Regarding graph processing benchmarks, I recently came across Graphalytics
[3]. The be
Hi Hawin
What you shared is not 'the Spark benchmark'.
This benchmark measures response time on a handful of relational queries of
different tools including Shark.
Shark development was ended a year ago on July 1, 2014 in favor of Spark SQL
which graduated from an alpha project on March 13, 2015.
Hi Hawin!
The benchmark you refer to is a more or less pure SQL benchmark.
For systems that are designed for exactly the "beyond SQL" applications
(streaming, iterative algorithms, UDFs, ...), this benchmark is probably
not very meaningful, as it covers not one of these areas.
Even in the SQL an
Hi Slim and Fabian
Here is the Spark benchmark. https://amplab.cs.berkeley.edu/benchmark/
Do we have s similar report or comparison like that.
Thanks.
Best regards
Hawin
On Mon, Jul 6, 2015 at 6:32 AM, Slim Baltagi wrote:
> Hi Fabian
>
> > I could not find which versions of Flink and Spar
Hi,
I am new to Flink community. I am interested in comparing Spark’s feature and
performance vs. Spark.
Does anyone know if there is any benchmark or test available for testing Spark
performance on servers that has 32 plus cores and 256GB plus memory?
Thanks
-yanping
From: Fabian Hueske [mai
Hi Fabian
> I could not find which versions of Flink and Spark were compared.
According to Norman Spangenberg, one of the authors of the conference paper,
the benchmark used *Spark* version was *1.2.0*. and *Flink* version was
*0.8.0*.
I did ask him a few more questions about the benchmark betwee
Hi Nathan!
The state is stored in a configurable "state backend". The state backend
itself must be fault tolerant, like HDFS, HBase, Cassandra, Ignite, ...
What the highly available Flink version does is to store the "StateHandle"
in Zookeeper. The StateHandle is the metadata that points to the s
Do you think it could be a good idea to extract Flink tuples in a separate
project so that to allow simpler dependency management in Flin-compatible
projects?
On Mon, Jul 6, 2015 at 11:06 AM, Fabian Hueske wrote:
> Hi,
>
> at the moment, Tuples are more efficient than POJOs, because POJO fields
Hi,
at the moment, Tuples are more efficient than POJOs, because POJO fields
are accessed via Java reflection whereas Tuple fields are directly accessed.
This performance penalty could be overcome by code-generated seriliazers
and comparators but I am not aware of any work in that direction.
Best
Hi to all,
I was thinking to write my own flink-compatible library and I need
basically a Tuple5.
Is there any performace loss in using a POJO with 5 String fields vs a
Tuple5?
If yes, wouldn't be a good idea to extract flink tuples in a separate
simple project (e.g. flink-java-tuples) that has no
In fact you can implement own composite data types (like Tuple, Pojo) that
can deal with nullable fields as keys but you need custom serializers and
comparators for that. These types won't be as efficient as types that
cannot handle null fields.
Cheers, Fabian
2015-07-02 20:17 GMT+02:00 Flavio Po
thanks for the information Aijoshcha!
i'd love to better understand what the long term solution is for fault
tolerance here. is the idea that zookeeper will be used to store the stream
state? or the idea is that we can efficiently use hdfs? or you are
designing your own key/value persistent storag
Fabian.
Thanks for the info and pointer to python. I'll check it out.
-Bill
From: Fabian Hueske [fhue...@gmail.com]
Sent: Monday, July 06, 2015 3:23 AM
To: user@flink.apache.org
Subject: Re: data conversion between flink and "other" paradigms
Hi Bill,
a Dat
Hi Bill,
a DataSet is just a logical concept in Flink. DataSets are often not
persisted and just streamed along operators. At the moment, there is no way
to access an intermediate DataSet of a Flink program directly (this might
change in the future).
You can process data in another function by im
Hi,
good questions, about 1. you are right, when the JobManager fails the state
is lost. Ufuk, Till and Stephan are currently working on making the
JobManager fault tolerant by having hot-standby JobManagers and storing the
important JobManager state in ZooKeeper. Maybe they can further comment on
Hi Chenliang,
most of the Flink committers are using IntelliJ, because it can better
handle the mixed Java/Scala modules. I personally would also recommend you
to do so.
If you can't select a maven project, then you should check whether the
maven plugin was installed correctly. You can see which
Hi,
most Flink developers use IntelliJ so it is probably easier for them to
help you with problems if you use IntelliJ. Also, IntelliJ is easier to
setup and works better for Flink because of mixed Java/Scala code.
Cheers,
Aljoscha
On Mon, 6 Jul 2015 at 03:39 Chenliang (Liang, DataSight) <
chenli
hi there,
I noticed the 0.9 release announces exactly-once semantics for streams. I
looked at the user guide and the primary mechanism for recovery appears to
be checkpointing of user state. I have a few questions:
1. The default behavior is that checkpoints are kept in memory on the
JobManager.
Just a question if there was some prior-art here. Just say someone wanted to
use flink for processing, but at some point they wanted to call another
function via say JNI/C which doesn't understand DataSet's. How would one go
about this ... I'm assuming the code would have to convert the data to
23 matches
Mail list logo