Re: Flink Serialization and case class fields limit

2018-11-16 Thread Andrea Sella
Hi Andrey,

My bad, I forgot to say that I am using Scala 2.11, that’s why I asked
about the limitation, and Flink 1.5.5.

If I recall correctly CaseClassSerilizer and CaseClassTypeInfo don’t rely
on unapply and tupled functions, so I'd say that Flink doesn't have this
kind of limitation with Scala 2.11. Correct?

Thank you,
Andrea
On Fri, 16 Nov 2018 at 19:34, Andrey Zagrebin 
wrote:

> Hi Andrea,
>
> 22 limit comes from Scala [1], not Flink.
> I am not sure about any repo for the post, but I also cc'ed Fabian, maybe
> he will point to some if it exists.
>
> Best,
> Andrey
>
> [1] https://underscore.io/blog/posts/2016/10/11/twenty-two.html
>
>
> On 16 Nov 2018, at 13:10, Andrea Sella  wrote:
>
> Hey squirrels,
>
> I've started to study more in-depth Flink Serialization and its "type
> system".
>
> I have a generated case class using scalapb that has more than 30 fields;
> I've seen that Flink still uses the CaseClassSerializer, the
> TypeInformation is CaseClassTypeInfo, even if in the docs[1] is written
> differently (22 fields limit). I'd have expected a GenericTypeInfo, but all
> is well because the CaseClassSerializer is faster than Kryo. Did I
> misunderstand the documentation or don't the limitation apply anymore?
>
> Another thing, I've read "Juggling with Bits and Bytes"[2] blog post an I
> would like to replicate the experiment with some tailored changes to deep
> dive even better in the topic. Is the source code in Github or somewhere
> else?
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/types_serialization.html#flinks-typeinformation-class
> [2]
> https://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html
>
> Thank you,
> Andrea
>
>
>


Flink Serialization and case class fields limit

2018-11-16 Thread Andrea Sella
Hey squirrels,

I've started to study more in-depth Flink Serialization and its "type
system".

I have a generated case class using scalapb that has more than 30 fields;
I've seen that Flink still uses the CaseClassSerializer, the
TypeInformation is CaseClassTypeInfo, even if in the docs[1] is written
differently (22 fields limit). I'd have expected a GenericTypeInfo, but all
is well because the CaseClassSerializer is faster than Kryo. Did I
misunderstand the documentation or don't the limitation apply anymore?

Another thing, I've read "Juggling with Bits and Bytes"[2] blog post an I
would like to replicate the experiment with some tailored changes to deep
dive even better in the topic. Is the source code in Github or somewhere
else?

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/types_serialization.html#flinks-typeinformation-class
[2]
https://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html

Thank you,
Andrea


Re: OutputFormat in streaming job

2016-05-06 Thread Andrea Sella
Hi Fabian,

So I misunderstood the behaviour of configure(), thank you.

Andrea

2016-05-06 14:17 GMT+02:00 Fabian Hueske <fhue...@gmail.com>:

> Hi Andrea,
>
> actually, OutputFormat.configure() will also be invoked per task. So you
> would also end up with 16 ActorSystems.
> However, I think you can use synchronized singleton object to start one
> ActorSystem per TM (each TM and all tasks run in a single JVM).
>
> So from the point of view of configure(), I think it does not make a
> difference whether to use an OutputFormat or a RichSinkFunction.
> I would rather go for the SinkFunction, which is better suited for
> streaming jobs.
>
> Cheers, Fabian
>
> 2016-05-06 14:10 GMT+02:00 Andrea Sella <andrea.se...@radicalbit.io>:
>
>> Hi Fabian,
>>
>> ATM I am not interesting to guarantee exactly-once processing, thank you
>> for the clarification.
>>
>> As far as I know, it is not present a similar method as OutputFormat's
>> configure for RichSinkFunction, correct? So I am not able to instantiate an
>> ActorSystem per TM but I have to instantiate an ActorSystem per TaskSlot,
>> which it is unsuitable because ActorSystem is very heavy.
>>
>> Example:
>> Outputformat => 2 TM => 16 parallelism => 2 ActorSystem (instantiate into
>> OutputFormat's configure method)
>> Sink => 2 TM => 16 parallelism => 16 Actor System  (instantiate into
>> RichSinkFunction's open method)
>>
>> Am I wrong?
>>
>> Thanks again,
>> Andrea
>>
>> 2016-05-06 13:47 GMT+02:00 Fabian Hueske <fhue...@gmail.com>:
>>
>>> Hi Andrea,
>>>
>>> you can use any OutputFormat to emit data from a DataStream using the
>>> writeUsingOutputFormat() method.
>>> However, this method does not guarantee exactly-once processing. In case
>>> of a failure, it might emit some records a second time. Hence the results
>>> will be written at least once.
>>>
>>> Hope this helps,
>>> Fabian
>>>
>>> 2016-05-06 12:45 GMT+02:00 Andrea Sella <andrea.se...@radicalbit.io>:
>>>
>>>> Hi,
>>>>
>>>> I created a custom OutputFormat to send data to a remote actor, there
>>>> are issues to use an OutputFormat into a stream job? Or it will treat like
>>>> a Sink?
>>>>
>>>> I prefer to use it in order to create a custom ActorSystem per TM in
>>>> the configure method.
>>>>
>>>> Cheers,
>>>> Andrea
>>>>
>>>
>>>
>>
>


Re: OutputFormat in streaming job

2016-05-06 Thread Andrea Sella
Hi Fabian,

ATM I am not interesting to guarantee exactly-once processing, thank you
for the clarification.

As far as I know, it is not present a similar method as OutputFormat's
configure for RichSinkFunction, correct? So I am not able to instantiate an
ActorSystem per TM but I have to instantiate an ActorSystem per TaskSlot,
which it is unsuitable because ActorSystem is very heavy.

Example:
Outputformat => 2 TM => 16 parallelism => 2 ActorSystem (instantiate into
OutputFormat's configure method)
Sink => 2 TM => 16 parallelism => 16 Actor System  (instantiate into
RichSinkFunction's open method)

Am I wrong?

Thanks again,
Andrea

2016-05-06 13:47 GMT+02:00 Fabian Hueske <fhue...@gmail.com>:

> Hi Andrea,
>
> you can use any OutputFormat to emit data from a DataStream using the
> writeUsingOutputFormat() method.
> However, this method does not guarantee exactly-once processing. In case
> of a failure, it might emit some records a second time. Hence the results
> will be written at least once.
>
> Hope this helps,
> Fabian
>
> 2016-05-06 12:45 GMT+02:00 Andrea Sella <andrea.se...@radicalbit.io>:
>
>> Hi,
>>
>> I created a custom OutputFormat to send data to a remote actor, there are
>> issues to use an OutputFormat into a stream job? Or it will treat like a
>> Sink?
>>
>> I prefer to use it in order to create a custom ActorSystem per TM in the
>> configure method.
>>
>> Cheers,
>> Andrea
>>
>
>


OutputFormat in streaming job

2016-05-06 Thread Andrea Sella
Hi,

I created a custom OutputFormat to send data to a remote actor, there are
issues to use an OutputFormat into a stream job? Or it will treat like a
Sink?

I prefer to use it in order to create a custom ActorSystem per TM in the
configure method.

Cheers,
Andrea


Re: Flink and YARN ship folder

2016-03-19 Thread Andrea Sella
Hi,

After few tests I am able to write and read on Alluxio.
I am using Flink 1.0.0 and in my case external libraries are not loaded
from lib folder to classpath, it loads only flink-dist_2.11-1.0.0.jar. I
need to specify the folder with -yt parameter to load the others.

If I run `/bin/flink run -m yarn-cluster -yn 4 -yjm 2048 -ytm 4096 some.jar
--input alluxio://somepath` it will throws an exception related to a
missing library ie. Class alluxio.hadoop.FileSystem not found
If I run `./bin/flink run -m yarn-cluster -yt lib/ -yn 4 -yjm 2048 -ytm
4096 some.jar --input alluxio://somepath` with -yt params it will work fine.

is it a bug?

Cheers,
Andrea




2016-03-14 15:00 GMT+01:00 Andrea Sella <andrea.se...@radicalbit.io>:

> Hi Robert,
>
> Ok, thank you.
>
> 2016-03-14 11:13 GMT+01:00 Robert Metzger <rmetz...@apache.org>:
>
>> Hi Andrea,
>>
>> You don't have to manually replicate any operations on the slaves. All
>> files in the lib/ folder are transferred to all containers (Jobmanagers and
>> TaskManagers).
>>
>>
>> On Sat, Mar 12, 2016 at 3:25 PM, Andrea Sella <andrea.se...@radicalbit.io
>> > wrote:
>>
>>> Hi Ufuk,
>>>
>>> I'm trying to execute the WordCount batch example with input and output
>>> on Alluxio, i followed Running Flink on Alluxio
>>> <http://www.alluxio.org/documentation/en/Running-Flink-on-Alluxio.html> and
>>> added the library to lib folder. Have I to replicate this operation on the
>>> slaves or YARN manage that and I must have the library just where I launch
>>> the job?
>>>
>>> Thanks,
>>> Andrea
>>>
>>> 2016-03-11 19:23 GMT+01:00 Ufuk Celebi <u...@apache.org>:
>>>
>>>> Everything in the lib folder should be added to the classpath. Can you
>>>> check the YARN client logs that the files are uploaded? Furthermore,
>>>> you can check the classpath of the JVM in the YARN logs of the
>>>> JobManager/TaskManager processes.
>>>>
>>>> – Ufuk
>>>>
>>>>
>>>> On Fri, Mar 11, 2016 at 5:33 PM, Andrea Sella
>>>> <andrea.se...@radicalbit.io> wrote:
>>>> > Hi,
>>>> >
>>>> > There is a way to add external dependencies to Flink Job,  running on
>>>> YARN,
>>>> > not using HADOOP_CLASSPATH?
>>>> > I am looking for a similar idea to standalone mode using lib folder.
>>>> >
>>>> > BR,
>>>> > Andrea
>>>>
>>>
>>>
>>
>


Re: Integration Alluxio and Flink

2016-03-15 Thread Andrea Sella
Hi Robert,

I've missed to tell that I built a fat-jar of the job using `sbt assembly`
so my job is including alluxio-core-client.


2016-03-15 17:45 GMT+01:00 Robert Metzger <rmetz...@apache.org>:

> Hi Andrea,
>
> the filesystem class can not be in the job jar. You have to put it into
> the lib folder.
>
> On Tue, Mar 15, 2016 at 5:40 PM, Andrea Sella <andrea.se...@radicalbit.io>
> wrote:
>
>> Hi Till,
>>
>> I put the jar as dependency of my job on build.sbt. I need to do
>> somenthing else?
>>
>> val flinkDependencies = Seq(
>>   "org.apache.flink" %% "flink-scala" % flinkVersion % "provided",
>>   "org.apache.flink" %% "flink-streaming-scala" % flinkVersion %
>> "provided",
>>   ("org.alluxio" % "alluxio-core-client" % "1.0.0").
>> exclude("org.jboss.netty", "netty").
>> exclude("io.netty", "netty").
>> exclude("io.netty", "netty-all").
>> exclude("org.slf4j", "slf4j-api").
>> exclude("commons-beanutils", "commons-beanutils-core").
>> exclude("commons-collections", "commons-collections").
>> exclude("commons-logging", "commons-logging").
>> exclude("com.esotericsoftware.minlog", "minlog").
>> exclude("org.apache.hadoop","hadoop-yarn-common")
>> )
>>
>> `show compile:dependencyClasspath` shows alluxio-core-client as
>> dependency.
>>
>> https://github.com/alkagin/alluxio-wordcount/
>>
>> Thanks,
>> Andrea
>>
>>
>>
>> 2016-03-15 17:09 GMT+01:00 Till Rohrmann <trohrm...@apache.org>:
>>
>>> Hi Andrea,
>>>
>>> can it be that the alluxio.hadoop.FileSystem is not in your classpath?
>>> Have you put the respective jar file in Flink’s lib folder?
>>>
>>> Cheers,
>>> Till
>>> ​
>>>
>>> On Tue, Mar 15, 2016 at 12:55 PM, Andrea Sella <
>>> andrea.se...@radicalbit.io> wrote:
>>>
>>>> Hi Till,
>>>>
>>>> I've tried your suggestion (source-code
>>>> <https://github.com/alkagin/alluxio-wordcount/>) and now it throws:
>>>> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
>>>> alluxio.hadoop.FileSystem not found.
>>>> The core-site.xml has been set correctly and into the alluxio-wordcount
>>>> jar is present alluxio-client. Do I need to specify the hadoop
>>>> configuration via code or core-site.xml is enough?
>>>>
>>>> Thank you again,
>>>> Andrea
>>>>
>>>> 2016-03-14 17:28 GMT+01:00 Till Rohrmann <trohrm...@apache.org>:
>>>>
>>>>> Hi Andrea,
>>>>>
>>>>> the problem won’t be netty-all but netty, I suspect. Flink is using
>>>>> version 3.8 whereas alluxio-core-client uses version 3.2.2. I think
>>>>> you have to exclude or shade this dependency away.
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>> ​
>>>>>
>>>>> On Mon, Mar 14, 2016 at 5:12 PM, Andrea Sella <
>>>>> andrea.se...@radicalbit.io> wrote:
>>>>>
>>>>>> Hi Till,
>>>>>> I tried to downgrade the Alluxio's netty version from 4.0.28.Final to
>>>>>> 4.0.27.Final to align Flink and Alluxio dependencies. First of all, Flink
>>>>>> 1.0.0 uses 4.0.27.Final, is it correct? Btw it doesn't work, same error 
>>>>>> as
>>>>>> above.
>>>>>>
>>>>>> BR,
>>>>>> Andrea
>>>>>>
>>>>>> 2016-03-14 15:30 GMT+01:00 Till Rohrmann <trohrm...@apache.org>:
>>>>>>
>>>>>>> Yes it seems as if you have a netty version conflict. Maybe the
>>>>>>> alluxio-core-client.jar pulls in an incompatible netty version. Could 
>>>>>>> you
>>>>>>> check whether this is the case? But maybe you also have another
>>>>>>> dependencies which pulls in a wrong netty version, since the Alluxio
>>>>>>> documentation indicates that it works with Flink (but I cannot tell for
>>>>>>> which version).
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Till
>>>>>>>
>>>>>>> On Mon, Mar 14, 2016 at 3:18 PM, Andrea Sella <
>>>>>>> andrea.se...@radicalbit.io> wrote:
>>>>>>>
>>>>>>>> Hi to all,
>>>>>>>>
>>>>>>>> I'm trying to integrate Alluxio and Apache Flink, I followed Running
>>>>>>>> Flink on Alluxio
>>>>>>>> <http://www.alluxio.org/documentation/en/Running-Flink-on-Alluxio.html>
>>>>>>>>  to
>>>>>>>> setup Flink.
>>>>>>>>
>>>>>>>> I tested in local mode executing:
>>>>>>>>
>>>>>>>> bin/flink run ./examples/batch/WordCount.jar --input
>>>>>>>> alluxio:///flink/README.txt
>>>>>>>>
>>>>>>>> But I've faced a TimeoutException, I attach the logs. It seems the
>>>>>>>> trouble is due to netty dependencies-conflict.
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>> Andrea
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


Re: JobManager Dashboard and YARN execution

2016-03-15 Thread Andrea Sella
Thanks Till for the update.

BR,
Andrea

2016-03-15 17:16 GMT+01:00 Till Rohrmann <trohrm...@apache.org>:

> Hi Andrea,
>
> there is also a PR [1] which will allow you to access the TaskManager logs
> via the UI.
>
> [1] https://github.com/apache/flink/pull/1790
>
> Cheers,
> Till
>
> On Wed, Mar 9, 2016 at 1:58 PM, Stephan Ewen <se...@apache.org> wrote:
>
>> Hi!
>>
>> Yes, the dashboard is available in both cases. It is proxied through the
>> Yarn UI, you can find the link from there...
>>
>> You can always access JobManager logs through the UI.
>>
>> Stephan
>>
>>
>> On Wed, Mar 9, 2016 at 12:47 PM, Andrea Sella <andrea.se...@radicalbit.io
>> > wrote:
>>
>>> Hi,
>>> I am experimenting the integration between Apache Flink and YARN.
>>>
>>> When I am running a Flink job using yarn-cluster, is the Dashboard
>>> available to monitor the execution? Is it also available for long running
>>> session?
>>>
>>> Is it possibile retrieve logs from the dashboard or I have to pass via
>>> yarn application logs?
>>>
>>> Thank you,
>>> Andrea
>>>
>>
>>
>


Re: Integration Alluxio and Flink

2016-03-15 Thread Andrea Sella
Hi Till,

I put the jar as dependency of my job on build.sbt. I need to do somenthing
else?

val flinkDependencies = Seq(
  "org.apache.flink" %% "flink-scala" % flinkVersion % "provided",
  "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % "provided",
  ("org.alluxio" % "alluxio-core-client" % "1.0.0").
exclude("org.jboss.netty", "netty").
exclude("io.netty", "netty").
exclude("io.netty", "netty-all").
exclude("org.slf4j", "slf4j-api").
exclude("commons-beanutils", "commons-beanutils-core").
exclude("commons-collections", "commons-collections").
exclude("commons-logging", "commons-logging").
exclude("com.esotericsoftware.minlog", "minlog").
exclude("org.apache.hadoop","hadoop-yarn-common")
)

`show compile:dependencyClasspath` shows alluxio-core-client as dependency.

https://github.com/alkagin/alluxio-wordcount/

Thanks,
Andrea



2016-03-15 17:09 GMT+01:00 Till Rohrmann <trohrm...@apache.org>:

> Hi Andrea,
>
> can it be that the alluxio.hadoop.FileSystem is not in your classpath?
> Have you put the respective jar file in Flink’s lib folder?
>
> Cheers,
> Till
> ​
>
> On Tue, Mar 15, 2016 at 12:55 PM, Andrea Sella <andrea.se...@radicalbit.io
> > wrote:
>
>> Hi Till,
>>
>> I've tried your suggestion (source-code
>> <https://github.com/alkagin/alluxio-wordcount/>) and now it throws:
>> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
>> alluxio.hadoop.FileSystem not found.
>> The core-site.xml has been set correctly and into the alluxio-wordcount
>> jar is present alluxio-client. Do I need to specify the hadoop
>> configuration via code or core-site.xml is enough?
>>
>> Thank you again,
>> Andrea
>>
>> 2016-03-14 17:28 GMT+01:00 Till Rohrmann <trohrm...@apache.org>:
>>
>>> Hi Andrea,
>>>
>>> the problem won’t be netty-all but netty, I suspect. Flink is using
>>> version 3.8 whereas alluxio-core-client uses version 3.2.2. I think you
>>> have to exclude or shade this dependency away.
>>>
>>> Cheers,
>>> Till
>>> ​
>>>
>>> On Mon, Mar 14, 2016 at 5:12 PM, Andrea Sella <
>>> andrea.se...@radicalbit.io> wrote:
>>>
>>>> Hi Till,
>>>> I tried to downgrade the Alluxio's netty version from 4.0.28.Final to
>>>> 4.0.27.Final to align Flink and Alluxio dependencies. First of all, Flink
>>>> 1.0.0 uses 4.0.27.Final, is it correct? Btw it doesn't work, same error as
>>>> above.
>>>>
>>>> BR,
>>>> Andrea
>>>>
>>>> 2016-03-14 15:30 GMT+01:00 Till Rohrmann <trohrm...@apache.org>:
>>>>
>>>>> Yes it seems as if you have a netty version conflict. Maybe the
>>>>> alluxio-core-client.jar pulls in an incompatible netty version. Could you
>>>>> check whether this is the case? But maybe you also have another
>>>>> dependencies which pulls in a wrong netty version, since the Alluxio
>>>>> documentation indicates that it works with Flink (but I cannot tell for
>>>>> which version).
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>>
>>>>> On Mon, Mar 14, 2016 at 3:18 PM, Andrea Sella <
>>>>> andrea.se...@radicalbit.io> wrote:
>>>>>
>>>>>> Hi to all,
>>>>>>
>>>>>> I'm trying to integrate Alluxio and Apache Flink, I followed Running
>>>>>> Flink on Alluxio
>>>>>> <http://www.alluxio.org/documentation/en/Running-Flink-on-Alluxio.html> 
>>>>>> to
>>>>>> setup Flink.
>>>>>>
>>>>>> I tested in local mode executing:
>>>>>>
>>>>>> bin/flink run ./examples/batch/WordCount.jar --input
>>>>>> alluxio:///flink/README.txt
>>>>>>
>>>>>> But I've faced a TimeoutException, I attach the logs. It seems the
>>>>>> trouble is due to netty dependencies-conflict.
>>>>>>
>>>>>> Thank you,
>>>>>> Andrea
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


Re: Integration Alluxio and Flink

2016-03-15 Thread Andrea Sella
Hi Till,

I've tried your suggestion (source-code
<https://github.com/alkagin/alluxio-wordcount/>) and now it throws:
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
alluxio.hadoop.FileSystem not found.
The core-site.xml has been set correctly and into the alluxio-wordcount jar
is present alluxio-client. Do I need to specify the hadoop configuration
via code or core-site.xml is enough?

Thank you again,
Andrea

2016-03-14 17:28 GMT+01:00 Till Rohrmann <trohrm...@apache.org>:

> Hi Andrea,
>
> the problem won’t be netty-all but netty, I suspect. Flink is using
> version 3.8 whereas alluxio-core-client uses version 3.2.2. I think you
> have to exclude or shade this dependency away.
>
> Cheers,
> Till
> ​
>
> On Mon, Mar 14, 2016 at 5:12 PM, Andrea Sella <andrea.se...@radicalbit.io>
> wrote:
>
>> Hi Till,
>> I tried to downgrade the Alluxio's netty version from 4.0.28.Final to
>> 4.0.27.Final to align Flink and Alluxio dependencies. First of all, Flink
>> 1.0.0 uses 4.0.27.Final, is it correct? Btw it doesn't work, same error as
>> above.
>>
>> BR,
>> Andrea
>>
>> 2016-03-14 15:30 GMT+01:00 Till Rohrmann <trohrm...@apache.org>:
>>
>>> Yes it seems as if you have a netty version conflict. Maybe the
>>> alluxio-core-client.jar pulls in an incompatible netty version. Could you
>>> check whether this is the case? But maybe you also have another
>>> dependencies which pulls in a wrong netty version, since the Alluxio
>>> documentation indicates that it works with Flink (but I cannot tell for
>>> which version).
>>>
>>> Cheers,
>>> Till
>>>
>>> On Mon, Mar 14, 2016 at 3:18 PM, Andrea Sella <
>>> andrea.se...@radicalbit.io> wrote:
>>>
>>>> Hi to all,
>>>>
>>>> I'm trying to integrate Alluxio and Apache Flink, I followed Running
>>>> Flink on Alluxio
>>>> <http://www.alluxio.org/documentation/en/Running-Flink-on-Alluxio.html> to
>>>> setup Flink.
>>>>
>>>> I tested in local mode executing:
>>>>
>>>> bin/flink run ./examples/batch/WordCount.jar --input
>>>> alluxio:///flink/README.txt
>>>>
>>>> But I've faced a TimeoutException, I attach the logs. It seems the
>>>> trouble is due to netty dependencies-conflict.
>>>>
>>>> Thank you,
>>>> Andrea
>>>>
>>>>
>>>>
>>>
>>
>


Re: Integration Alluxio and Flink

2016-03-14 Thread Andrea Sella
Hi Till,
I tried to downgrade the Alluxio's netty version from 4.0.28.Final to
4.0.27.Final to align Flink and Alluxio dependencies. First of all, Flink
1.0.0 uses 4.0.27.Final, is it correct? Btw it doesn't work, same error as
above.

BR,
Andrea

2016-03-14 15:30 GMT+01:00 Till Rohrmann <trohrm...@apache.org>:

> Yes it seems as if you have a netty version conflict. Maybe the
> alluxio-core-client.jar pulls in an incompatible netty version. Could you
> check whether this is the case? But maybe you also have another
> dependencies which pulls in a wrong netty version, since the Alluxio
> documentation indicates that it works with Flink (but I cannot tell for
> which version).
>
> Cheers,
> Till
>
> On Mon, Mar 14, 2016 at 3:18 PM, Andrea Sella <andrea.se...@radicalbit.io>
> wrote:
>
>> Hi to all,
>>
>> I'm trying to integrate Alluxio and Apache Flink, I followed Running
>> Flink on Alluxio
>> <http://www.alluxio.org/documentation/en/Running-Flink-on-Alluxio.html> to
>> setup Flink.
>>
>> I tested in local mode executing:
>>
>> bin/flink run ./examples/batch/WordCount.jar --input
>> alluxio:///flink/README.txt
>>
>> But I've faced a TimeoutException, I attach the logs. It seems the
>> trouble is due to netty dependencies-conflict.
>>
>> Thank you,
>> Andrea
>>
>>
>>
>


Integration Alluxio and Flink

2016-03-14 Thread Andrea Sella
Hi to all,

I'm trying to integrate Alluxio and Apache Flink, I followed Running Flink
on Alluxio
 to
setup Flink.

I tested in local mode executing:

bin/flink run ./examples/batch/WordCount.jar --input
alluxio:///flink/README.txt

But I've faced a TimeoutException, I attach the logs. It seems the trouble
is due to netty dependencies-conflict.

Thank you,
Andrea


alluxio-integration.log
Description: Binary data


Re: Flink and YARN ship folder

2016-03-14 Thread Andrea Sella
Hi Robert,

Ok, thank you.

2016-03-14 11:13 GMT+01:00 Robert Metzger <rmetz...@apache.org>:

> Hi Andrea,
>
> You don't have to manually replicate any operations on the slaves. All
> files in the lib/ folder are transferred to all containers (Jobmanagers and
> TaskManagers).
>
>
> On Sat, Mar 12, 2016 at 3:25 PM, Andrea Sella <andrea.se...@radicalbit.io>
> wrote:
>
>> Hi Ufuk,
>>
>> I'm trying to execute the WordCount batch example with input and output
>> on Alluxio, i followed Running Flink on Alluxio
>> <http://www.alluxio.org/documentation/en/Running-Flink-on-Alluxio.html> and
>> added the library to lib folder. Have I to replicate this operation on the
>> slaves or YARN manage that and I must have the library just where I launch
>> the job?
>>
>> Thanks,
>> Andrea
>>
>> 2016-03-11 19:23 GMT+01:00 Ufuk Celebi <u...@apache.org>:
>>
>>> Everything in the lib folder should be added to the classpath. Can you
>>> check the YARN client logs that the files are uploaded? Furthermore,
>>> you can check the classpath of the JVM in the YARN logs of the
>>> JobManager/TaskManager processes.
>>>
>>> – Ufuk
>>>
>>>
>>> On Fri, Mar 11, 2016 at 5:33 PM, Andrea Sella
>>> <andrea.se...@radicalbit.io> wrote:
>>> > Hi,
>>> >
>>> > There is a way to add external dependencies to Flink Job,  running on
>>> YARN,
>>> > not using HADOOP_CLASSPATH?
>>> > I am looking for a similar idea to standalone mode using lib folder.
>>> >
>>> > BR,
>>> > Andrea
>>>
>>
>>
>


Re: Flink and YARN ship folder

2016-03-12 Thread Andrea Sella
Hi Ufuk,

I'm trying to execute the WordCount batch example with input and output on
Alluxio, i followed Running Flink on Alluxio
<http://www.alluxio.org/documentation/en/Running-Flink-on-Alluxio.html> and
added the library to lib folder. Have I to replicate this operation on the
slaves or YARN manage that and I must have the library just where I launch
the job?

Thanks,
Andrea

2016-03-11 19:23 GMT+01:00 Ufuk Celebi <u...@apache.org>:

> Everything in the lib folder should be added to the classpath. Can you
> check the YARN client logs that the files are uploaded? Furthermore,
> you can check the classpath of the JVM in the YARN logs of the
> JobManager/TaskManager processes.
>
> – Ufuk
>
>
> On Fri, Mar 11, 2016 at 5:33 PM, Andrea Sella
> <andrea.se...@radicalbit.io> wrote:
> > Hi,
> >
> > There is a way to add external dependencies to Flink Job,  running on
> YARN,
> > not using HADOOP_CLASSPATH?
> > I am looking for a similar idea to standalone mode using lib folder.
> >
> > BR,
> > Andrea
>


Flink and YARN ship folder

2016-03-11 Thread Andrea Sella
Hi,

There is a way to add external dependencies to Flink Job,  running on YARN,
not using HADOOP_CLASSPATH?
I am looking for a similar idea to standalone mode using lib folder.

BR,
Andrea


JobManager Dashboard and YARN execution

2016-03-09 Thread Andrea Sella
Hi,
I am experimenting the integration between Apache Flink and YARN.

When I am running a Flink job using yarn-cluster, is the Dashboard
available to monitor the execution? Is it also available for long running
session?

Is it possibile retrieve logs from the dashboard or I have to pass via yarn
application logs?

Thank you,
Andrea


Re: Webclient script misses building from source

2016-03-08 Thread Andrea Sella
I missed it!

Thank you,
Andrea

2016-03-08 14:27 GMT+01:00 Aljoscha Krettek <aljos...@apache.org>:

> Hi Andrea,
> in Flink 1.0 there is no more a separate web client. The web client is
> part of the default JobManager dashboard now.
>
> You can also disable the web client part of the JobManager dashboard by
> setting:
>
> jobmanager.web.submit.enable: false
>
> in flink-conf.yaml.
>
> Cheers,
> Aljoscha
> > On 08 Mar 2016, at 14:21, Andrea Sella <andrea.se...@radicalbit.io>
> wrote:
> >
> > Hi,
> >
> > I've built Flink from source but I was not able to find in
> build-target/bin the script start-webclient.sh to launch the WebUI.
> > The script is available just in the binaries or I have to add an
> argument to trigger its generation?
> >
> > Thanks in advance,
> > Andrea
> >
>
>


Webclient script misses building from source

2016-03-08 Thread Andrea Sella
Hi,

I've built Flink from source but I was not able to find in build-target/bin
the script start-webclient.sh to launch the WebUI.
The script is available just in the binaries or I have to add an argument
to trigger its generation?

Thanks in advance,
Andrea


Underlying TaskManager's Actor System

2016-02-24 Thread Andrea Sella
Hi,
There is a way to access to the underlying TaskManager's Actor System?

Thank you in advance,
Andrea


Re: Unexpected behavior with Scala App trait.

2016-01-19 Thread Andrea Sella
Hi Chiwan,

I’m not expert of Scala but It seems about closure cleaning problem. Scala
> App trait extends DelayedInit trait to initialize object. But Flink
> serialization stack doesn’t handle this special initialization. (It is just
> my opinion, not verified.)
>

I arrived at the same conclusion with the DelayedInit trait.

>
> To run TFIDFNPE safely, you need to just change tokenize and uniqueWords
> to method like following:
>
> ```
> def tokenize = …
> def uniqueWords = ...
> ```
>
> With this change, I tested that TFIDFNPE works safely in Flink 0.10.1
> cluster.
>

Yeah, it works and i knew it. My aim it was to use tricky (functional)
Scala mechanisms to test how much the Scala APIs are robust and
idomitic-friendly for Scala users.

>
> About TFIDF object, you should avoid overriding main method if the object
> is derived by App trait. It is also related DelaytedInit mechanism [1].
>

Yeah, App trait is into a comment 'cause I am overriding the main method.
TFIDF is equal to TFIDFNPE with App trait and not overriding main method.


Thanks,
Andrea

>
> [1]:
> https://github.com/scala/scala/blob/2.10.x/src/library/scala/App.scala#L31
>
> > On Jan 19, 2016, at 12:08 AM, Andrea Sella <andrea.se...@radicalbit.io>
> wrote:
> >
> > Hi,
> >
> > I was implementing TF-IDF example of flink-training when I faced a
> problem with NPE during the deploy of my Job.
> >
> > Source code: https://github.com/alkagin/flink-tfidf-example
> >
> > I used 0.10.1 version and started in local mode.
> > During the deploy of TFIDFNPE Job, which it extends App, Flink throws
> NullPointerException on both flatMap functions.
> > If I include the tokenize function into the closures of flatMap
> functions, the Job works fine; see example TFIDFApp.
> > To avoid this unexpected behavior I don't have use Scala App trait, see
> TFIDF, but why?
> >
> >
> > Thanks,
> > Andrea
> >
>
>
> Regards,
> Chiwan Park
>
>