Hi Philip,
Thank you for your questions. I think you have mapped the HIVE
functions to the Flink ones correctly. Just a remark on the ORDER BY.
You wrote that it produces a total order of all the records. In this
case, you'd have do a SortPartition operation with parallelism set to
1. This is
Thanks for starting this Kostas.
I think the list is quite hidden in the wiki. Should we link from
flink.apache.org to that page?
Cheers, Fabian
2015-10-19 14:50 GMT+02:00 Kostas Tzoumas :
> Hi everyone,
>
> I started a "Powered by Flink" wiki page, listing some of the
>
Thanks for starting and big +1 for making it more prominent.
On Mon, Oct 19, 2015 at 2:53 PM, Fabian Hueske wrote:
> Thanks for starting this Kostas.
>
> I think the list is quite hidden in the wiki. Should we link from
> flink.apache.org to that page?
>
> Cheers, Fabian
>
>
Hi Philip,
You're welcome. Just a small correction: Hive's SORT BY should be
DataSet.groupBy(key).sortGroup(key) in Flink. This ensures sorted
grouped records within the reducer that follows. No need to set the
parallelism to 1.
Best,
Max
On Mon, Oct 19, 2015 at 1:28 PM, Philip Lee
Thanks, Fabian.
I just want to check one thing again.
As you said, [Distribute By] is partitionByHash(). and [Sort By] should be
sortGroup on Flink. However, [Cluster By] is consist of partitionByHash().
*sortPartition()*.
As far as I know, [Cluster By] is same as the combination with
Hi everyone,
I started a "Powered by Flink" wiki page, listing some of the organizations
that are using Flink:
https://cwiki.apache.org/confluence/display/FLINK/Powered+by+Flink
If you would like to be added to the list, just send me a short email with
your organization's name and a description
Sounds good +1
2015-10-19 14:57 GMT+02:00 Márton Balassi :
> Thanks for starting and big +1 for making it more prominent.
>
> On Mon, Oct 19, 2015 at 2:53 PM, Fabian Hueske wrote:
>
>> Thanks for starting this Kostas.
>>
>> I think the list is quite
Hi Philip,
here a few additions to what Max said:
- ORDER BY: As Max said, Flink's sortPartition() does only sort with a
partition and does not produce a total order. You can either set the
parallelism to 1 as Max suggested or use a custom partitioner to range
partition the data.
- SORT BY: From
The difference between a partition and a group is the following:
- A partition refers to all records that are processed by a task instance
or sub task. If you use hash partitioning, all elements that share the same
key will be in one partition, but usually there will be more than one key
in a
yes, definitely. How about a link under the Community drop-down that points
to the wiki page?
On Mon, Oct 19, 2015 at 2:53 PM, Fabian Hueske wrote:
> Thanks for starting this Kostas.
>
> I think the list is quite hidden in the wiki. Should we link from
> flink.apache.org to
+1 to this.
On Mon, Oct 19, 2015 at 3:00 PM, Fabian Hueske wrote:
> Sounds good +1
>
> 2015-10-19 14:57 GMT+02:00 Márton Balassi :
>
> > Thanks for starting and big +1 for making it more prominent.
> >
> > On Mon, Oct 19, 2015 at 2:53 PM, Fabian
Hello,
We are running into a strange problem with Direct Memory buffers. From what
I know, we are not using any direct memory buffers inside our code.
This is pretty trivial streaming application just doing some dedupliction
and union some kafka streams.
/Jakob
2015-10-19 13:27:59,064 INFO
+1 for adding it to the website instead of wiki.
"Who is using Flink?" is always a question difficult to answer to
interested users.
On 19.10.2015 15:08, Suneel Marthi wrote:
+1 to this.
On Mon, Oct 19, 2015 at 3:00 PM, Fabian Hueske > wrote:
Nice indeed :-)
On Mon, Oct 19, 2015 at 3:08 PM, Suneel Marthi
wrote:
> +1 to this.
>
> On Mon, Oct 19, 2015 at 3:00 PM, Fabian Hueske wrote:
>
>> Sounds good +1
>>
>> 2015-10-19 14:57 GMT+02:00 Márton Balassi :
>>
>> >
Hi Jakob,
Thank you for reporting the bug. Could you please post your
configuration here? In particular, could you please tell us the value
of the following configuration variables:
taskmanager.heap.mb
taskmanager.network.numberOfBuffers
taskmanager.memory.off-heap
Are you running the Flink
@Timo: The proposal was to keep the list in the wiki (can be easily
extended) but link from the main website to the wiki page.
2015-10-19 15:16 GMT+02:00 Timo Walther :
> +1 for adding it to the website instead of wiki.
> "Who is using Flink?" is always a question difficult
+1 Let's collect in the Wiki for now. At some point in time, we might
want to have a dedicated page on the Flink homepage.
On Mon, Oct 19, 2015 at 3:31 PM, Timo Walther wrote:
> Ah ok, sorry. I think linking to the wiki is also ok.
>
>
> On 19.10.2015 15:18, Fabian Hueske
I think it's not a nice solution to check for the type of the returned
execution environment to determine whether it is a local or a remote
execution environment.
Wouldn't it be better to add a method isLocal() to ExecutionEnvironment?
Cheers, Fabian
2015-10-14 19:14 GMT+02:00 Flavio
Hi,
See answers below.
/Jakob
On Mon, Oct 19, 2015 at 4:03 PM, Maximilian Michels wrote:
> Hi Jakob,
>
> Thank you for reporting the bug. Could you please post your
> configuration here? In particular, could you please tell us the value
> of the following configuration
I forgot to ask you: Which version of Flink are you using? 0.9.1 or
0.10-SNAPSHOT?
On Mon, Oct 19, 2015 at 5:05 PM, Maximilian Michels wrote:
> Hi Jakob,
>
> Thanks. Flink allocates its network memory as direct memory outside
> the normal Java heap. By default, that is 64MB but
Hi Jakob,
Thanks. Flink allocates its network memory as direct memory outside
the normal Java heap. By default, that is 64MB but can grow up to
128MB on heavy network transfer. How much memory does your machine
have? Could it be that your upper memory bound is lower than 2048 +
128 MB?
Best,
Max
It's 0.10-SNAPSHOT
Gyula
Maximilian Michels ezt írta (időpont: 2015. okt. 19., H,
17:13):
> I forgot to ask you: Which version of Flink are you using? 0.9.1 or
> 0.10-SNAPSHOT?
>
> On Mon, Oct 19, 2015 at 5:05 PM, Maximilian Michels
> wrote:
> > Hi Jakob,
> >
I'm doing some research on Flink + Avro integration, and I've come across
"org.apache.flink.api.java.io.AvroInputFormat" as a way to create a stream
of Avro objects from a file. I had the following questions:
1. Is this the extent of Flink's integration with Avro? If I wanted to read
Do you really need to iterate ?
On Mon, Oct 19, 2015 at 5:42 PM, flinkuser wrote:
>
> Here is my code snippet but I find the union operator not workable.
>
> DataStream msgDataStream1 = env.addSource((new
> SocketSource(hostName1,port,'\n',-1))).filter(new
>
When was the last time you updated your 0.10-SNAPSHOT Flink cluster?
If it has been more than a couple of weeks, then I'd advise you to
update to the latest snapshot version. There has been an issue with
the calculation of the off-heap memory limit in the past.
Thanks,
Max
On Mon, Oct 19, 2015
Here is my code snippet but I find the union operator not workable.
DataStream msgDataStream1 = env.addSource((new
SocketSource(hostName1,port,'\n',-1))).filter(new
MessageFilter()).setParallelism(1);
DataStream msgDataStream2 = env.addSource((new
The revision is "Starting JobManager (Version: 0.10-SNAPSHOT, Rev:c82ebbf,
Date:15.10.2015 @ 11:34:01 CEST)"
We have a lot of memory left on the machine. I have increased it quite a
lot.
What is your thought on memory configuration?
If I understand Flink correctly, you should only have one
Hi Fabian,
I implemented your approach from above. However, the runtime decides to run
two subtasks on the same node, resulting in an out of memory error.
I thought partitioning really does partition the data to nodes, but now it
seems like its partitioning to tasks, and tasks can be one the same
Hi Andrew,
1a,
In general Flink can read and write Avro data through the AvroInputFormat
and AvroOutputtFormat in both the batch and the streaming API. In general
you can write the following:
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
DataStream
You can see the revision number and the build date in the JobManager
log file, e.g. "Starting JobManager (Version: 0.10-SNAPSHOT,
Rev:1b79bc1, Date:18.10.2015 @ 20:15:08 CEST)"
On Mon, Oct 19, 2015 at 5:53 PM, Maximilian Michels wrote:
> When was the last time you updated your
30 matches
Mail list logo