Re: Retrieve written records of a sink after job

2018-02-19 Thread Flavio Pompermaier
at 10:31 AM, Fabian Hueske <fhue...@gmail.com> wrote: > Hi Flavio, > > Not sure if I would add this functionality to the sinks. > You could also add a MapFunction with a counting Accumulator right before > each sink. > > Best, Fabian > > > 2018-02-14 14:11 GM

Re: Retrieve written records of a sink after job

2018-02-14 Thread Flavio Pompermaier
, if you rather want to work on files/json you can enable job > archiving by configuring the jobmanager.archive.fs.dir directory. When > the job finishes this will contain a big JSON file for each job containing > all responses that the UI would return for finished jobs. > > > On

Re: Retrieve written records of a sink after job

2018-02-14 Thread Flavio Pompermaier
link-docs-master/monitoring/metrics.html#rest-api-integration> > . > > > On 14.02.2018 12:38, Flavio Pompermaier wrote: > > Actually I'd like to get this number from my Java class in order to update > some external dataset "catalog", > so I'm ask

Re: Retrieve written records of a sink after job

2018-02-14 Thread Flavio Pompermaier
ered by FLINK-7286 > which aims at allowing functions > to modify the numRecordsIn/numRecordsOut counts. > > > On 14.02.2018 12:22, Flavio Pompermaier wrote: > > Hi to all, > I have a (batch) job that writes to 1 or more sinks. > Is there a way to retrieve, once the job ha

Retrieve written records of a sink after job

2018-02-14 Thread Flavio Pompermaier
Hi to all, I have a (batch) job that writes to 1 or more sinks. Is there a way to retrieve, once the job has terminated, the number of records written to each sink? Is there any better way than than using an accumulator for each sink? If that is the only way to do that, the Sink API could be

Re: Hadoop compatibility and HBase bulk loading

2018-01-16 Thread Flavio Pompermaier
atibility > support. > I'll also have no time to work on this any time soon :-( > > 2018-01-13 1:34 GMT+01:00 Flavio Pompermaier <pomperma...@okkam.it>: > >> Any progress on this Fabian? HBase bulk loading is a common task for us >> and it's very annoying an

Re: Hadoop compatibility and HBase bulk loading

2018-01-12 Thread Flavio Pompermaier
Any progress on this Fabian? HBase bulk loading is a common task for us and it's very annoying and uncomfortable to run a separate YARN job to accomplish it... On 10 Apr 2015 12:26, "Flavio Pompermaier" <pomperma...@okkam.it> wrote: Great! That will be awesome. Thank you Fabian

Re: Flink vs Spark streaming benchmark

2017-12-17 Thread Flavio Pompermaier
11/benchmarking-structured-streaming-on-databricks- >> runtime-against-state-of-the-art-streaming-systems.html >> >> Regards, >> Vijay Raajaa G S >> > > -- Flavio Pompermaier Development Department OKKAM S.r.l. Tel. +(39) 0461 041809

Re: [ANNOUNCE] Apache Flink 1.4.0 released

2017-12-12 Thread Flavio Pompermaier
the list of contributors: >> >> https://flink.apache.org/news/2017/12/12/release-1.4.0.html >> >> The full release notes are available in Jira: >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?proje >> ctId=12315522=12340533 >> >> >> I

Re: How to write dataset as parquet format

2017-11-22 Thread Flavio Pompermaier
I usually refer to this: https://github.com/FelixNeutatz/parquet-flinktacular On 22 Nov 2017 18:29, "Fabian Hueske" wrote: > Hi Ebru, > > AvroParquetOutputFormat seems to implement Hadoop's OutputFormat interface. > Flink provides a wrapper for Hadoop's OutputFormat [1], so

Re: Flink memory leak

2017-11-14 Thread Flavio Pompermaier
a Kafka topic and writes it back to another topic > > in > > Kafka. We cancel the job and start another every 5 seconds. > > After > > ~30 minutes of doing this process, the cluster reaches the > > OS memory > > limit and dies. > Currently, we have a test cluster

Re: Off heap memory issue

2017-11-13 Thread Flavio Pompermaier
e article, setting -D > jdk.nio.maxCachedBufferSize > > This variable is available for Java > 8u102 > > Best regards, > > Kien > > [1]http://www.evanjones.ca/java-bytebuffer-leak.html > > On 10/18/2017 4:06 PM, Flavio Pompermaier wrote: > > We also faced the same problem, but the nu

Re: Flink memory leak

2017-11-08 Thread Flavio Pompermaier
We also have the same problem in production. At the moment the solution is to restart the entire Flink cluster after every job.. We've tried to reproduce this problem with a test (see https://issues.apache.org/jira/browse/FLINK-7845) but we don't know whether the error produced by the test and the

Re: State snapshotting when source is finite

2017-10-26 Thread Flavio Pompermaier
eature. >> >> I'm looping in Till and Aljoscha who might have some thoughts on this as >> well. >> Depending on the discussion we should open a JIRA for this feature. >> >> Cheers, Fabian >> >> 2017-10-25 10:31 GMT+02:00 Flavio Pompermaier <pompe

State snapshotting when source is finite

2017-10-25 Thread Flavio Pompermaier
Hi to all, in my current use case I'd like to improve one step of our batch pipeline. There's one specific job that ingest a tabular dataset (of Rows) and explode it into a set of RDF statements (as Tuples). The objects we output are a containers of those Tuples (grouped by a field). Flink

Re: Off heap memory issue

2017-10-18 Thread Flavio Pompermaier
We also faced the same problem, but the number of jobs we can run before restarting the cluster depends on the volume of the data to shuffle around the network. We even had problems with a single job and in order to avoid OOM issues we had to put some configuration to limit Netty memory usage,

Re: Flink 1.3.2 Netty Exception

2017-10-13 Thread Flavio Pompermaier
.flink.runtime.io.network.NetworkEnvironment.registerTask( NetworkEnvironment.java:186) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:602) > at java.lang.Thread.run(Thread.java:745) > > > On 11.10.2017 12:48, Flavio Pompermaier wrote: > > Hi to all, >

Flink 1.3.2 Netty Exception

2017-10-11 Thread Flavio Pompermaier
Hi to all, we wrote a small JUnit test to reproduce a memory issue we have in a Flink job (that seems related to Netty) . At some point, usually around the 28th loop, the job fails with the following exception (actually we never faced that in production but maybe is related to the memory issue

Re: Eception while running Table API job

2017-09-14 Thread Flavio Pompermaier
Sorry..the error was caused by the fact that I've moved the table jar only on the job manager machine. After copying the jar from opt to lib in all the other TM machines the job was able to continue! Best, Flavio On Thu, Sep 14, 2017 at 6:56 PM, Flavio Pompermaier <pomperma...@okkam.it>

Eception while running Table API job

2017-09-14 Thread Flavio Pompermaier
Hi to all, I've tested my Flink 1.3.1 on my local machine and everything was fine. So I've tried to run it on the cluster but I've got the following weird exception (I've already moved flink-table_2.10-1.3.1.jar from opt to lib): Caused by: java.lang.RuntimeException: The initialization of the

Re: Table API and registration of DataSet/DataStream

2017-09-14 Thread Flavio Pompermaier
way. > > Best, Fabian > > > > > 2017-09-14 16:12 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: > >> Hi Fabian, >> basically these were my problems with Table API. >> >> 1 ) Table.sql() has a different where syntax than Table.where() ,

Re: Table API and registration of DataSet/DataStream

2017-09-14 Thread Flavio Pompermaier
017 at 3:43 PM, Fabian Hueske <fhue...@gmail.com> wrote: > Not sure what you mean by "translate a where clause to a filter function". > > Isn't that exactly what Table.filter(String condition) is doing? > It translates a SQL-like condition (represented as String) into an > opera

Re: BucketingSink never closed

2017-09-13 Thread Flavio Pompermaier
I've just looked at Robert presentation at FF [1] and that's exactly what I was waiting for streaming planning/training... Very useful ;) [1] https://www.youtube.com/watch?v=8l8dCKMMWkw On Wed, Sep 13, 2017 at 12:04 PM, Flavio Pompermaier <pomperma...@okkam.it> wrote: > Hi Gordon,

Re: BucketingSink never closed

2017-09-13 Thread Flavio Pompermaier
racks > the number of successfully indexed elements, which can then be queried from > the REST API / Web UI. That wouldn’t be too much effort. What do you think, > would that be useful for your case? > Would be happy to hear your thoughts on this! > > Cheers, > Gordon > >

Re: BucketingSink never closed

2017-09-12 Thread Flavio Pompermaier
t; http://apache-flink-mailing-list-archive.1008284.n3. > nabble.com/DISCUSS-Adding-a-dispose-method-in-the- > RichFunction-td14466.html#a14468 > > Kostas > > On Sep 8, 2017, at 4:27 PM, Flavio Pompermaier <pomperma...@okkam.it> > wrote: > > Hi to all,

Re: Table API and registration of DataSet/DataStream

2017-09-09 Thread Flavio Pompermaier
, there is no way to change or override a registered table. We > had this functionality once, but had to remove it after a Calcite version > upgrade. > Can you use a new TableEnvironment and register the new table there? > > Best, Fabian > > 2017-09-08 17:55 GMT+02:00 Flavio Pompermaie

Table API and registration of DataSet/DataStream

2017-09-08 Thread Flavio Pompermaier
Hi to all, I have a doubt about Table API. Let's say my code is something like: StreamTableEnvironment te = ...; RowTypeInfo rtf = new RowTypeInfo(...); DataStream myDs = te.registerDataStream("test",myDs,columnNames); Table table = te.sql("SELECT *, (NAME = 'John') as VALID FROM test WHERE

BucketingSink never closed

2017-09-08 Thread Flavio Pompermaier
Hi to all, I'm trying to test a streaming job but the files written by the BucketingSink are never finalized (remains into the pending state). Is this caused by the fact that the job finishes before the checkpoint? Shouldn't the sink properly close anyway? This is my code: @Test public void

Re: Apache Phenix integration

2017-09-08 Thread Flavio Pompermaier
I opened an issue for this: https://issues.apache.org/jira/browse/FLINK-7605 On Wed, Sep 6, 2017 at 4:24 PM, Flavio Pompermaier <pomperma...@okkam.it> wrote: > Maybe this should be well documented also...is there any dedicated page to > Flink and JDBC connectors? > > On Wed, S

Re: Apache Phenix integration

2017-09-06 Thread Flavio Pompermaier
conn.setAutoCommit(true); > } > > to JdbcOutputFormat.open(). > > Cheers, Fabian > > > > 2017-09-06 15:55 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: > >> Hi Fabian, >> thanks for the detailed answer. Obviously you are right :) >>

Re: Apache Phenix integration

2017-09-06 Thread Flavio Pompermaier
gt; (or doesn't support it). Can you check that Flavio? > If the Phoenix connector supports but not activates auto commits by > default, we can enable it in JdbcOutputFormat.open(). > If auto commits are not supported, we can add a check after execute() and > call commit() only if Connec

Apache Phenix integration

2017-09-06 Thread Flavio Pompermaier
Hi to all, I'm writing a job that uses Apache Phoenix. At first I used the PhoenixOutputFormat as (hadoop) OutputFormat but it's not well suited to work with Table API because it cannot handle generic objects like Rows (it need a DBWritable Object that should be already present at compile time).

Re: ANN: TouK Nussknacker - designing Flink processes with GUI

2017-09-04 Thread Flavio Pompermaier
Thanks for sharing this nice tool maciek. Does it handle both batch and Streaming? Is it able to visualize also an existing Flink program? Best, Flavio On Mon, Sep 4, 2017 at 3:03 PM, Maciek Próchniak wrote: > Hello, > > we would like to announce availability of TouK Nussknacker

Help with table UDF

2017-08-31 Thread Flavio Pompermaier
Hi all, I'm using Flink 1.3.1 and I'm trying to register an UDF but there's something wrong. I always get the following exception: java.lang.UnsupportedOperationException: org.apache.flink.table.expressions.TableFunctionCall cannot be transformed to RexNode at

Re: Elasticsearch Sink - Error

2017-08-30 Thread Flavio Pompermaier
erators.KeyedProcessOperator. > processElement(KeyedProcessOperator.java:94) > at > org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput( > StreamInputProcessor.java:206) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run( > OneInputStreamTask.java:69) >

Re: HBase connection problems on Flink 1.3.1

2017-08-22 Thread Flavio Pompermaier
r .profile (not .bashrc because it's read only when the shell is interactive). I don't know whether this is the best solution or not but it works... Best, Flavio On Tue, Aug 22, 2017 at 12:06 PM, Flavio Pompermaier <pomperma...@okkam.it> wrote: > Hi to all, > I'm trying to connect t

HBase connection problems on Flink 1.3.1

2017-08-22 Thread Flavio Pompermaier
Hi to all, I'm trying to connect to HBase on Flink 1.3.1 but it seems that *HBaseConfiguration.create()* doesn't work correctly (because zookeper properties are not read from hbase-site.xml). I've also tried to put the hbase-site.xml in the flink conf folder but it didn't work.. What should I do?

Re: Streaming use case: Row enrichment

2017-06-16 Thread Flavio Pompermaier
ads the > contents of the file at the split and sends it downstream. There is thus no > central point where we would know that a file was completely processed. > > Best, > Aljoscha > > On 16. Jun 2017, at 11:26, Flavio Pompermaier <pomperma...@okkam.it> > wrote: >

Re: Stateful streaming question

2017-06-16 Thread Flavio Pompermaier
t; > Best, > Aljoscha > > On 15. Jun 2017, at 20:11, Flavio Pompermaier <pomperma...@okkam.it> > wrote: > > Hi Aljoscha, > we're still investigating possible solutions here. Yes, as you correctly > said there are links between data of different keys so we can only pr

Re: Streaming use case: Row enrichment

2017-06-16 Thread Flavio Pompermaier
y of removing files from the > input directory once they have been processed because it’s not possible to > know when the contents of a given files have passed through the complete > pipeline. > > Best, > Aljoscha > > On 15. Jun 2017, at 20:00, Flavio Pompermaier <pomper

Re: Flink cluster : Client is not connected to any Elasticsearch nodes!

2017-06-16 Thread Flavio Pompermaier
When this connector was improved to be resilient to ES problems we used to use Logstash to index on ES and it was really cumbersome...this connector ease a lot the work of indexing to ES: it's much faster, it can index directly without persisting to file and it's a lot much easier to filter

Re: Guava version conflict

2017-06-15 Thread Flavio Pompermaier
on-shaded Guava dependencies. > > Let me investigate a bit and get back to this! > > Cheers, > Gordon > > > On 8 June 2017 at 2:47:02 PM, Flavio Pompermaier (pomperma...@okkam.it) > wrote: > > On an empty machine (with Ubuntu 14.04.5 LTS) and an empty maven local >

Re: Stateful streaming question

2017-06-15 Thread Flavio Pompermaier
use case, does the data you would store in the Flink managed state > have links between data of different keys? This sounds like it could be a > problem when it comes to consistency when outputting to an external system. > > Best, > Aljoscha > > On 17. May 2017, at 14:12, Fla

Re: Streaming use case: Row enrichment

2017-06-15 Thread Flavio Pompermaier
> enriched.addSink(sink) > > In this case, the file source will close once all files are read and the > job will finish. If you don’t want this you can also use a different > readFile() method where you can specify that you want to continue > monitoring the directory for new fi

Re: Flink 1.3 REST API wrapper for Scala

2017-06-12 Thread Flavio Pompermaier
Nice lib! Is it available also on maven central? On 13 Jun 2017 4:36 am, "Michael Reid" wrote: > I'm currently working on a project where I need to manage jobs > programmatically without being tied to Flink, so I wrote a small, > asynchronous Scala wrapper around the

Re: Cannot write record to fresh sort buffer. Record too large.

2017-06-12 Thread Flavio Pompermaier
Try to see of in the output of dmesg command there are some log about an OOM. The OS logs there such info. I had a similar experience recently... see [1] Best, Flavio [1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-and-swapping-question-td13284.html On 12 Jun 2017

Re: Streaming job monitoring

2017-06-08 Thread Flavio Pompermaier
systems for this > exact use-case. The WebFrontend is not meant as a replacement for these > systems. > > In general i would recommend to setup a dedicated metrics system like > graphite/ganglia to store metrics and use grafana > or something similar to actually monitor them. &g

Streaming job monitoring

2017-06-08 Thread Flavio Pompermaier
Hi to all, we've successfully ran our first straming job on a Flink cluster (with some problems with the shading of guava..) and it really outperforms Logstash, from the point of view of indexing speed and easiness of use. However there's only one problem: when the job is running, in the Job

Re: Guava version conflict

2017-06-07 Thread Flavio Pompermaier
, make sure you’re not > using Maven 3.3+. There are shading problems when Flink is built with Maven > versions higher then that. > The flink-dist jar should not contain any non-shaded Guava dependencies, > could you also quickly check that? > > On 7 June 2017 at 5:42:28 PM, Flavio Pomper

Guava version conflict

2017-06-07 Thread Flavio Pompermaier
Hi to all, I'm trying to use the new ES connector to index data from Flink (with ES 2.4.1). When I try to run it from Eclipse everything is ok, when I run it from the cluster I get the following exception: java.lang.NoSuchMethodError: com.google.common.util.

Re: Flink and swapping question

2017-06-07 Thread Flavio Pompermaier
.opts: -Dio.netty. recycler.maxCapacity.default=1" . Best, Flavio On Tue, Jun 6, 2017 at 7:42 PM, Flavio Pompermaier <pomperma...@okkam.it> wrote: > Hi Stephan, > I also think that the error is more related to netty. > The only suspicious library I use are parquet or thrift. > I

Re: Flink and swapping question

2017-06-06 Thread Flavio Pompermaier
y grew? > > Thanks, Fabian > > 2017-05-29 20:53 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: > >> Hi to all, >> I'm still trying to understand what's going on our production Flink >> cluster. >> The facts are: >> >> 1. The Flink clust

Re: Streaming use case: Row enrichment

2017-06-06 Thread Flavio Pompermaier
s, I think you can. You would use this to fire of your > queries to solr/ES. > > Best, > Aljoscha > > On 11. May 2017, at 15:06, Flavio Pompermaier <pomperma...@okkam.it> > wrote: > > Hi to all, > we have a particular use case where we have a tabular dataset on

Re: Keys distribution insights

2017-06-06 Thread Flavio Pompermaier
how many elements > you had per key. With this you know which partition, i.e. which parallel > instance had which keys and how many they were. > > Best, > Aljoscha > > > On 5. Jun 2017, at 12:01, Flavio Pompermaier <pomperma...@okkam.it> > wrote: > > > > Hi

Batch Side Outputs

2017-06-06 Thread Flavio Pompermaier
Hi to all, will side outputs [FLINK-4460 ] be eventually available also for batch API? Best, Flavio

Keys distribution insights

2017-06-05 Thread Flavio Pompermaier
Hi everybody, in my job I have a groupReduce operator with parallelism 4 and one of the sub-tasks takes a huge amount of time (wrt the others). My guess is that the objects assigned to that slot have much more data to reduce (an thus are somehow computationally heavy within the groupReduce

Re: Flink and swapping question

2017-05-29 Thread Flavio Pompermaier
e > this > > insanely large parameter is coming from? > > > > Best, > > Aljoscha > > > > > On 25. May 2017, at 19:36, Flavio Pompermaier <pomperma...@okkam.it> > > > wrote: > > > > > > Hi to all, > > > I thi

Collapsible job plan visualization

2017-05-25 Thread Flavio Pompermaier
Hi to all, In our experience the Flink plan diagram is a nice feature but it is useless almost all the time and it has an annoying interaction with the mouse wheelI suggest to make it a collapsible div. IMHO that would be an easy thing that would definitively improve the user experience

Re: Flink and swapping question

2017-05-25 Thread Flavio Pompermaier
s of Flink troubleshooting could be an added value sooner or later.. Best, Flavio On Thu, May 25, 2017 at 10:21 AM, Flavio Pompermaier <pomperma...@okkam.it> wrote: > I can confirm that after giving less memory to the Flink TM the job was > able to run successfully. > After almost 2 weeks o

Re: Flink and swapping question

2017-05-25 Thread Flavio Pompermaier
). I hope this tips could save someone else's day.. Best, Flavio On Wed, May 24, 2017 at 4:28 PM, Flavio Pompermaier <pomperma...@okkam.it> wrote: > Hi Greg, you were right! After typing dmsg I found "Out of memory: Kill > process 13574 (java)". > This is really strange bec

Re: Flink and swapping question

2017-05-24 Thread Flavio Pompermaier
available swap memory, the OS decides to kill the Flink TM :( Any idea of what's going on here? On Wed, May 24, 2017 at 2:32 PM, Flavio Pompermaier <pomperma...@okkam.it> wrote: > Hi Greg, > I carefully monitored all TM memory with jstat -gcutil and there'no full > gc, only

Re: Flink and swapping question

2017-05-24 Thread Flavio Pompermaier
Linux's OOM killer. Are you seeing such a message in dmesg? > > Greg > > On Wed, May 24, 2017 at 3:18 AM, Flavio Pompermaier <pomperma...@okkam.it> > wrote: > >> Hi to all, >> I'd like to know whether memory swapping could cause a taskmanager crash. >> In my clust

Flink and swapping question

2017-05-24 Thread Flavio Pompermaier
Hi to all, I'd like to know whether memory swapping could cause a taskmanager crash. In my cluster of virtual machines 'm seeing this strange behavior in my Flink cluster: sometimes, if memory get swapped the taskmanager (on that machine) dies unexpectedly without any log about the error. Is that

Re: Stateful streaming question

2017-05-17 Thread Flavio Pompermaier
’t have a lot of expertise in that). I >> don’t see how that problem will go away with Flink – so still need to >> handle serialization. >> >> >> >> 3) Even if you do decide to move to Flink – I think you can do >> this with one job, two jobs are not nee

Re: Stateful streaming question

2017-05-16 Thread Flavio Pompermaier
l > > This is still an experimental feature, but let us know your opinion if you > use it. > > Finally, an alternative would be to keep state in Flink, and periodically > flush it to an external storage system, which you can > query at will. > > Thanks, > Kostas > > > On

Stateful streaming question

2017-05-16 Thread Flavio Pompermaier
Hi to all, we're still playing with Flink streaming part in order to see whether it can improve our current batch pipeline. At the moment, we have a job that translate incoming data (as Row) into Tuple4, groups them together by the first field and persist the result to disk (using a thrift

Re: Thrift object serialization

2017-05-16 Thread Flavio Pompermaier
rdon > > [1] https://github.com/twitter/chill/blob/develop/ > chill-thrift/src/main/java/com/twitter/chill/thrift/TBaseSerializer.java > > > On 16 May 2017 at 3:26:32 PM, Flavio Pompermaier (pomperma...@okkam.it) > wrote: > > Hi Gordon, > thanks for the link. Will th

Re: Thrift object serialization

2017-05-16 Thread Flavio Pompermaier
> Cheers, > Gordon > > [1] https://ci.apache.org/projects/flink/flink-docs- > release-1.3/dev/custom_serializers.html > > On 15 May 2017 at 9:08:33 PM, Flavio Pompermaier (pomperma...@okkam.it) > wrote: > > Hi to all, > in my Flink job I create a Dataset using Had

Thrift object serialization

2017-05-15 Thread Flavio Pompermaier
nd when I print the type of ds I see: - Java Tuple2<Void,* GenericType*> Fom my experience GenericType is a performace killer...what should I do to improve the reading/writing of MyThriftObj? Best, Flavio -- Flavio Pompermaier Development Department OKKAM S.r.l. Tel. +(39) 0461 1823908

Streaming use case: Row enrichment

2017-05-11 Thread Flavio Pompermaier
Hi to all, we have a particular use case where we have a tabular dataset on HDFS (e.g. a CSV) that we want to enrich filling some cells with the content returned by a query to a reverse index (e.g. solr/elasticsearch). Since we want to be able to make this process resilient and scalable we thought

Re: Re: Re: ElasticsearchSink on DataSet

2017-05-11 Thread Flavio Pompermaier
o, I made a PR for this : https://github.com/apache/ > flink/pull/3869 > And it also support ActionRequestFailureHandler in DataSet's > ElasticsearchSink > > Best > > 在2017年05月09 15时30分, "Flavio Pompermaier"<pomperma...@okkam.it>写道: > > > Just one note: I t

Re: Re: ElasticsearchSink on DataSet

2017-05-09 Thread Flavio Pompermaier
/commit/3743e898104d79a9813d444d38fa9f86617bb5ef Best, Flavio On Tue, May 9, 2017 at 8:17 AM, Flavio Pompermaier <pomperma...@okkam.it> wrote: > Thanks a lot for the support! > > On 9 May 2017 07:53, "Tzu-Li (Gordon) Tai" <tzuli...@apache.org> wrote: > >> H

Re:Re: ElasticsearchSink on DataSet

2017-05-09 Thread Flavio Pompermaier
i"<tzuli...@apache.org>写道: > > > Hi Flavio, > > I don’t think there is a bridge class for this. At the moment you’ll have > to implement your own OutputFormat. > The ElasticsearchSink is a SinkFunction which is part of the DataStream > API, which generally speak

Flink + Kafka + avro example

2017-05-05 Thread Flavio Pompermaier
Hi to all Flink users, we've just published on our Okkam public repository an example of using Flink 1.2.1 + Kafka 0.10 to exchange Avro objects[1]. We hope this could be helpful for new Flink users willing to play with Flink streaming. [1]

ElasticsearchSink on DataSet

2017-05-03 Thread Flavio Pompermaier
Hi to all, at the moment I have a Flink Job that generates a DataSet that I write to a File that is read by Logstash to index data on ES. I'd like to use the new ElasticsearchSink to index those JSON directly from Flink but ElasticsearchSink only works with streaming environment. Is there any

Batch source improvement

2017-04-29 Thread Flavio Pompermaier
Hi to all, we're still using Flink as a batch processor and despite not very advertised is still doing great. However there's one thing I always wanted to ask: when reading data from a source the job manager computes the splits and assigns a set of them to every instance of the InputFormat. This

Re: Problems reading Parquet input from HDFS

2017-04-28 Thread Flavio Pompermaier
;parquet.avro.projection", > "{\"type\":\"record\",\"name\":\"Customer\",\"fields\":[{\"name\":\"c_custkey\",\"type\":\"int\"}]}");* > * env.createInput(hif).print();* > * }* > > &g

Re: hadoopcompatibility not in dist

2017-04-28 Thread Flavio Pompermaier
I faced this problem yesterday and putting flink-hadoop-compatibility under flink/lib folder solved the problem for me. But what is the official recommendation? Should I put it into lib or opt folder? Is there any difference from a class-loading point of view? Best, Flavio On Fri, Apr 7, 2017 at

Re: UnilateralSortMerger error (again)

2017-04-27 Thread Flavio Pompermaier
Great!! Thanks a lot Kurt On Thu, Apr 27, 2017 at 5:31 PM, Kurt Young <ykt...@gmail.com> wrote: > Hi, i have found the bug: https://issues.apache.org/jira/browse/FLINK-6398, > will open a PR soon. > > Best, > Kurt > > On Thu, Apr 27, 2017 at 8:23 PM, Flavio Pomper

Re: UnilateralSortMerger error (again)

2017-04-27 Thread Flavio Pompermaier
Thanks a lot Kurt! On Thu, Apr 27, 2017 at 2:12 PM, Kurt Young <ykt...@gmail.com> wrote: > Thanks for the test case, i will take a look at it. > > Flavio Pompermaier <pomperma...@okkam.it>于2017年4月27日 周四03:55写道: > >> I've created a repository with a unit test to r

Re: UnilateralSortMerger error (again)

2017-04-26 Thread Flavio Pompermaier
tests of GroupReduceDriver use Record and > testing Rows in not very straightforward and I'm still trying to reproduce > the problem in a local env.. > > On Fri, Apr 21, 2017 at 9:53 PM, Flavio Pompermaier <pomperma...@okkam.it> > wrote: > >> Thanks for the explanation . Is th

Re: UnilateralSortMerger error (again)

2017-04-26 Thread Flavio Pompermaier
:53 PM, Flavio Pompermaier <pomperma...@okkam.it> wrote: > Thanks for the explanation . Is there a way to force this behaviour in a > local environment (to try to debug the problem)? > > On 21 Apr 2017 21:49, "Fabian Hueske" <fhue...@gmail.com> wrote: > >

Re: Problems reading Parquet input from HDFS

2017-04-24 Thread Flavio Pompermaier
I started from this guide: https://github.com/FelixNeutatz/parquet-flinktacular Best, Flavio On 24 Apr 2017 6:36 pm, "Jörn Franke" wrote: > Why not use a parquet only format? Not sure why you need an > avtoparquetformat. > > On 24. Apr 2017, at 18:19, Lukas Kircher

Re: UnilateralSortMerger error (again)

2017-04-21 Thread Flavio Pompermaier
sorted runs > of records. > Later all (up to a fanout threshold) these sorted runs are read and merged > to get a completely sorted record stream. > > 2017-04-21 14:09 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: > >> The error appears as soon as some taskmanager

Re: UnilateralSortMerger error (again)

2017-04-21 Thread Flavio Pompermaier
The error appears as soon as some taskmanager generates some inputchannel file. What are those files used for? On Fri, Apr 21, 2017 at 11:53 AM, Flavio Pompermaier <pomperma...@okkam.it> wrote: > In another run of the job I had another Exception. Could it be helpful? > >

Re: UnilateralSortMerger error (again)

2017-04-21 Thread Flavio Pompermaier
gt; > > > In the past, these errors were most often caused by bugs in the > serializers, not in the sorter. > > > > What types are you using at that point? The Stack Trace reveals ROW and > StringValue, any other involved types? > > > > On Fri, Apr

Re: UnilateralSortMerger error (again)

2017-04-21 Thread Flavio Pompermaier
e, any other involved types? > > On Fri, Apr 21, 2017 at 9:36 AM, Flavio Pompermaier <pomperma...@okkam.it> > wrote: > >> As suggested by Fabian I set taskmanager.memory.size = 1024 (to force >> spilling to disk) and the job failed almost immediately.. >> >> On

Re: UnilateralSortMerger error (again)

2017-04-21 Thread Flavio Pompermaier
As suggested by Fabian I set taskmanager.memory.size = 1024 (to force spilling to disk) and the job failed almost immediately.. On Fri, Apr 21, 2017 at 12:33 AM, Flavio Pompermaier <pomperma...@okkam.it> wrote: > I debugged a bit the process repeating the job on a sub-slice of the >

Re: UnilateralSortMerger error (again)

2017-04-20 Thread Flavio Pompermaier
& id < 99.960.000.000 => 23.888.750 rows => OK id >= 99.960.000.000 => 32.936.422 rows => OK 4 TM with 8 GB and 4 slot each, parallelism 16 id >= 99.960.000.000 => 32.936.422 rows => OK id >= 99.945.000.000 => 56.825.172 rows => ERROR Any help is appreciated.

Re: UnilateralSortMerger error (again)

2017-04-19 Thread Flavio Pompermaier
ve time, maybe try with 1.2.1 RC and see if the error is > reproducible ? > > Cheers > > On Wed, Apr 19, 2017 at 11:22 AM, Flavio Pompermaier <pomperma...@okkam.it > > wrote: > >> Hi to all, >> I think I'm again on the weird Exception with the >> Spillin

Re: Flink slots, threads, task, etc

2017-04-19 Thread Flavio Pompermaier
t. > > Regarding 2.2, This is so to allow executing a pipeline of parallelism 8 > using a cluster that has 8 free slots. Basically, each slice fills one slot. > > Regarding 3, I don’t really have an answer. > > Regarding 4, Yes, this can get a bit out of hand if you have very l

Re: ANN: Euphoria API

2017-04-05 Thread Flavio Pompermaier
Isn't it the same thing of Apache Beam? Which are the differences? On Wed, Apr 5, 2017 at 10:29 AM, Petr Novotnik < petr.novot...@firma.seznam.cz> wrote: > Hello Flink Community, > > we would like to announce the availability of the Euphoria API [1] — a > thin, declarative Java API for big-data

Flink UI and records sent

2017-04-05 Thread Flavio Pompermaier
Hi to all, I'm using Flink 1.2.0 and I have a job that (at some point) calls dataset.first(1M). Sometimes the records sent displayed by the UI are less than 1M (lik 999709). Is it possible that the UI (or the internal Flink counters) miss some record? Best, Flavio -- Flavio Pompermaier

Re: unclear exception when writing to elasticsearch

2017-03-01 Thread Flavio Pompermaier
Did you build Flink from sources or are you using the packeged version? Because I had an annoying problem when compiling Flink with maven > 3.3. >From https://ci.apache.org/projects/flink/flink-docs- release-1.2/setup/building.html#dependency-shading: Maven 3.0.x, 3.1.x, and 3.2.x It is

Re: ElasticsearchSink Exception

2017-02-25 Thread Flavio Pompermaier
va_2.10', > version: '1.2.0' > compile group: 'org.apache.flink', name: 'flink-java', version: '1.2.0' > compile group: 'org.apache.flink', name: > 'flink-connector-kafka-0.10_2.10', version: '1.2.0' > compile group: 'org.apache.flink', name: 'flink-clients_2.10', version: > '

Re: ElasticsearchSink Exception

2017-02-25 Thread Flavio Pompermaier
Are you sure that in elasticsearch.yml you've enabled ES to listen to the http port 9300? On 25 Feb 2017 08:58, "Govindarajan Srinivasaraghavan" < govindragh...@gmail.com> wrote: Hi All, I'm getting the below exception when I start my flink job. I have verified the elastic search host and it

CSV sink partitioning and bucketing

2017-02-17 Thread Flavio Pompermaier
Hi to all, in my use case I'd need to output my Row objects into an output folder as CSV on HDFS but creating/overwriting new subfolders based on an attribute (for example create a subfolder for each value of a specified column). Then, it could be interesting to bucketing the data inside those

Re: Missing test dependency in Eclipse with Flink 1.2.0

2017-02-14 Thread Flavio Pompermaier
org.apache.felix > maven-bundle-plugin > 3.0.1 > true > true > > > > Nico > > [1] https://issues.apache.org/jira/browse/FLINK-4813 > > On Tuesday, 14 February 2017 17:45:12 CET Flavio Pompermaier wrote: > > Hi Ni

Re: Missing test dependency in Eclipse with Flink 1.2.0

2017-02-14 Thread Flavio Pompermaier
-DarchetypeArtifactId=flink-quickstart-java \ > -DarchetypeVersion=1.2.0 > > for which I could not find any dependency issues using Eclipse. > > Regards, > Nico > > On Tuesday, 14 February 2017 14:17:10 CET Flavio Pompermaier wrote: > > Hi to all, > > I've tried to migrate

Missing test dependency in Eclipse with Flink 1.2.0

2017-02-14 Thread Flavio Pompermaier
Hi to all, I've tried to migrate to Flink 1.2.0 and now my Eclipse projects says that they can't find *apacheds-jdbm1* that has packaging bundle. Should I install some plugin? Best, Flavio

<    1   2   3   4   5   6   7   >