sorted cogroup

2015-07-20 Thread Michele Bertoni
Hi everybody, i need to execute a cogroup on sorted groups. I explain it better: I have two datasets i.e. (key, value), I want to cogroup on key and then the have both iterator sorted by value how can i get it? I know iterator should be collected to be sorted but i want to avoid it. what happens

Re: Flink Kafka cannot find org/I0Itec/zkclient/serialize/ZkSerializer

2015-07-20 Thread Wendong
to be specific, the error occurs at: org.apache.flink.*streaming.connectors.kafka.api.KafkaSource.initializeConnection*(KafkaSource.java:175) -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Kafka-cannot-find-org-I0Itec-zkclient-seriali

Flink Kafka cannot find org/I0Itec/zkclient/serialize/ZkSerializer

2015-07-20 Thread Wendong
Here is my simple program to use Kafka: import org.apache.kafka.clients.producer.{ProducerConfig, KafkaProducer, ProducerRecord} import org.apache.flink.streaming.api.environment._ import org.apache.flink.streaming.connectors.kafka import org.apache.flink.streaming.connectors.kafka.api._ import or

Re: Flink Kafka example in Scala

2015-07-20 Thread Wendong
Hi Till, Thanks for your suggestion! I did a fat jar and the runtime error of ClassNotFoundException was finally gone. I wish I had tried fat jar earlier and it would have saved me 4 days. Wendong -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble

I want use count() method in this link but when write but not found in dataset why

2015-07-20 Thread hagersaleh
I want use count() method in this link but when write DataSet customers = getCustomerDataSet(env,mask,l,map); Long i=customers.count(); not found count() in dataset why https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/java/DataSet.html#count%28%29 -- View

Re: How can handles Exist ,not Exist query on flink

2015-07-20 Thread hagersaleh
when write this code display error no interface expected here public static class MyCoGrouper extends CoGroupFunction { ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); DataSet customers = getCustomerDataSet(env,mask,l,map); DataSet orders= getOrdersDat

Re: loop break operation

2015-07-20 Thread Pa Rö
i try the following, but it take always maxIterations, maybe someone can give me a review? private int benchmarkCounter; private static int iterationCounter = 1; private static DataSet oldCentroids; FlinkMain(int benchmarkCounter) { this.benchmarkCounter = benchmarkCounte

Re: Nested Iterations Outlook

2015-07-20 Thread Maximilian Michels
You could do that but you might run into merge conflicts. Also keep in mind that it is work in progress :) On Mon, Jul 20, 2015 at 4:15 PM, Maximilian Alber < alber.maximil...@gmail.com> wrote: > Thanks! > > Ok, cool. If I would like to test it, I just need to merge those two pull > requests into

Re: Too few memory segments provided exception

2015-07-20 Thread Ufuk Celebi
BTW we should add an entry for this to the faq and point to the configuration or faq entry in the exception message. On 20 Jul 2015, at 15:15, Vasiliki Kalavri wrote: > Hi Shivani, > > why are you using a vertex-centric iteration to compute the approximate > Adamic-Adar? > It's not an iterati

Re: Nested Iterations Outlook

2015-07-20 Thread Maximilian Alber
Thanks! Ok, cool. If I would like to test it, I just need to merge those two pull requests into my current branch? Cheers, Max On Mon, Jul 20, 2015 at 4:02 PM, Maximilian Michels wrote: > Now that makes more sense :) I thought by "nested iterations" you meant > iterations in Flink that can be

Re: Too few memory segments provided exception

2015-07-20 Thread Vasiliki Kalavri
Hi Shivani, the Jaccard example is implemented in Giraph, and therefore uses iterations. However, in Gelly we are not forced to do that for non-iterative computations. I see that there is some confusion with the implementation specifics. Let me try to write down some skeleton code / detailed desc

Re: Too few memory segments provided exception

2015-07-20 Thread Shivani Ghatge
But it will need to build BloomFilters for each vertex for each edge so idk how efficient that would be. On Mon, Jul 20, 2015 at 4:02 PM, Shivani Ghatge wrote: > Hello Vasia, > > I will adapt the exact method for BloomFilter. (I think it can be done. > Sorry. My mistake). > > > On Mon, Jul 20, 2

Re: Nested Iterations Outlook

2015-07-20 Thread Maximilian Michels
Now that makes more sense :) I thought by "nested iterations" you meant iterations in Flink that can be nested, i.e. starting an iteration inside an iteration. The caching/pinning of intermediate results is still a work in progress in Flink. It is actually in a state where it could be merged but s

Re: Too few memory segments provided exception

2015-07-20 Thread Shivani Ghatge
Hello Vasia, I will adapt the exact method for BloomFilter. (I think it can be done. Sorry. My mistake). On Mon, Jul 20, 2015 at 3:45 PM, Shivani Ghatge wrote: > Also the example of Jaccard that you had linked me to used VertexCentric > configuration which I understand is because that api only

Re: Too few memory segments provided exception

2015-07-20 Thread Shivani Ghatge
Also the example of Jaccard that you had linked me to used VertexCentric configuration which I understand is because that api only uses VertexCentricIteration for all the operations? But I think that is the best way in order to know what neighbors belong to the BloomFilter? On Mon, Jul 20, 2015 at

Re: Too few memory segments provided exception

2015-07-20 Thread Shivani Ghatge
Hello Vasia, As I had mentioned before, I need a BloomFilter as well as a HashSet for the approximation to work. In the exact solution I am getting two HashSets and comparing them. In approximate version, if we get two BloomFilters then we have no way to compare the neighborhood sets. I thought w

Re: Too few memory segments provided exception

2015-07-20 Thread Vasiliki Kalavri
I believe there was some work in progress to reduce memory fragmentation and solve similar problems. Anyone knows what's happening with that? On 20 July 2015 at 16:29, Andra Lungu wrote: > I also questioned the vertex-centric approach before. The exact > computation does not throw this exception

Re: Nested Iterations Outlook

2015-07-20 Thread Maximilian Alber
Oh sorry, my fault. When I wrote it, I had iterations in mind. What I actually wanted to say, how "resuming from intermediate results" will work with (non-nested) "non-Flink" iterations? And with iterations I mean something like this: while(...): - change params - submit to cluster where the

Re: Too few memory segments provided exception

2015-07-20 Thread Andra Lungu
I also questioned the vertex-centric approach before. The exact computation does not throw this exception so I guess adapting the approximate version will do the trick [I also suggested improving the algorithm to use less operators offline]. However, the issue still persists. We saw it in Affinity

Re: Too few memory segments provided exception

2015-07-20 Thread Vasiliki Kalavri
Hi Shivani, why are you using a vertex-centric iteration to compute the approximate Adamic-Adar? It's not an iterative computation :) In fact, it should be as complex (in terms of operators) as the exact Adamic-Adar, only more efficient because of the different neighborhood representation. Are yo

Re: Too few memory segments provided exception

2015-07-20 Thread Maximilian Michels
Hi Shivani, The issue is that by the time the Hash Join is executed, the MutableHashTable cannot allocate enough memory segments. That means that your other operators are occupying them. It is fine that this also occurs on Travis because the workers there have limited memory as well. Till suggest

Re: Too few memory segments provided exception

2015-07-20 Thread Andra Lungu
Hi, I am afraid this is a known issue: http://mail-archives.apache.org/mod_mbox/flink-dev/201503.mbox/%3CCAK5ODX7_-Wxg9pr7CkkkG4CzA+yNCNMvmea5L2i2iZZV=2c...@mail.gmail.com%3E The behavior back then seems to be exactly what Shivani is experiencing at the moment. At that point I remember Fabian sug

Re: Too few memory segments provided exception

2015-07-20 Thread Shivani Ghatge
Hello Maximilian, Thanks for the suggestion. I will use it to check the program. But when I am creating a PR for the same implementation with a Test, I am getting the same error even on Travis build. So for that what would be the solution? Here is my PR https://github.com/apache/flink/pull/923 An

Re: Too few memory segments provided exception

2015-07-20 Thread Till Rohrmann
The taskmanager.memory.fraction you can also set from within the IDE by giving the corresponding configuration object to the LocalEnvironment using the setConfiguration method. However, the taskmanager.heap.mb is basically the -Xmx value with which you start your JVM. Usually, you can set this in y

Re: Too few memory segments provided exception

2015-07-20 Thread Maximilian Michels
Hi Shivani, Flink doesn't have enough memory to perform a hash join. You need to provide Flink with more memory. You can either increase the "taskmanager.heap.mb" config variable or set "taskmanager.memory.fraction" to some value greater than 0.7 and smaller then 1.0. The first config variable all

Too few memory segments provided exception

2015-07-20 Thread Shivani Ghatge
Hello, I am working on a problem which implements Adamic Adar Algorithm using Gelly. I am running into this exception for all the Joins (including the one that are part of the reduceOnNeighbors function) Too few memory segments provided. Hash Join needs at least 33 memory segments. The problem

Re: Nested Iterations Outlook

2015-07-20 Thread Maximilian Michels
"So it is up to debate how the support for resuming from intermediate results will look like." -> What's the current state of that debate? Since there is no support for nested iterations that I know of, the debate how intermediate results are integrated has not started yet. > "Intermediate resu

Re: loop break operation

2015-07-20 Thread Fabian Hueske
Use a broadcastset to distribute the old centers to a map which has the new centers as regular input. Put the old centers in a hashmap in open() and check the distance to the new centers in map(). On Jul 20, 2015 12:55 PM, "Pa Rö" wrote: > okay, i have found it. how to compare my old and new cent

Re: loop break operation

2015-07-20 Thread Pa Rö
okay, i have found it. how to compare my old and new centers? 2015-07-20 12:16 GMT+02:00 Sachin Goel : > Gah. Sorry. > In the closeWith call, give a second argument which determines if the > iteration should be stopped. > > -- Sachin Goel > Computer Science, IIT Delhi > m. +91-9871457685 > On Jul

Re: loop break operation

2015-07-20 Thread Sachin Goel
Gah. Sorry. In the closeWith call, give a second argument which determines if the iteration should be stopped. -- Sachin Goel Computer Science, IIT Delhi m. +91-9871457685 On Jul 20, 2015 3:21 PM, "Pa Rö" wrote: > i not found the "iterateWithTermination" function, only "iterate" and > "iterateDe

Re: Nested Iterations Outlook

2015-07-20 Thread Maximilian Alber
Thanks for the answer! But I need some clarification: "So it is up to debate how the support for resuming from intermediate results will look like." -> What's the current state of that debate? "Intermediate results are not produced within the iterations cycles." -> Ok, if there are none, what does

Re: loop break operation

2015-07-20 Thread Pa Rö
i not found the "iterateWithTermination" function, only "iterate" and "iterateDelta". i use flink 0.9.0 with java. 2015-07-20 11:30 GMT+02:00 Sachin Goel : > Hi > You can use iterateWithTermination to terminate before max iterations. The > feedback for iteration then would be (next solution, isCo

Re: loop break operation

2015-07-20 Thread Sachin Goel
Hi You can use iterateWithTermination to terminate before max iterations. The feedback for iteration then would be (next solution, isConverged) where isConverged is an empty data set if you wish to terminate. However, this is something I have a pull request for: https://github.com/apache/flink/pull

loop break operation

2015-07-20 Thread Pa Rö
hello community, i have write a k-means app in flink, now i want change my terminate condition from max iteration to checking the changing of the cluster centers, but i don't know how i can break the flink loop. here my execution code of flink: public void run() { //load properties

Re: Nested Iterations Outlook

2015-07-20 Thread Maximilian Michels
Hi Max, You are right, there is no support for nested iterations yet. As far as I know, there are no concrete plans to add support for it. So it is up to debate how the support for resuming from intermediate results will look like. Intermediate results are not produced within the iterations cycles

Re: Flink Kafka example in Scala

2015-07-20 Thread Stephan Ewen
For other issues (hadoop versions), we used a shell script that did a search and replace on the variables. Maybe you can do the same trick here... On Mon, Jul 20, 2015 at 10:37 AM, Anwar Rizal wrote: > Coz I don't like it :-) > > No, seriously, sure, I can do it with maven. It worked indeed wi

Re: Flink Kafka example in Scala

2015-07-20 Thread Anwar Rizal
Coz I don't like it :-) No, seriously, sure, I can do it with maven. It worked indeed with maven. But the rest of our ecosystem uses sbt. That's why. -Anwar On Mon, Jul 20, 2015 at 10:28 AM, Till Rohrmann wrote: > Why not trying maven instead? > > On Mon, Jul 20, 2015 at 10:23 AM, Anwar Riz

Re: Flink Kafka example in Scala

2015-07-20 Thread Till Rohrmann
Why not trying maven instead? On Mon, Jul 20, 2015 at 10:23 AM, Anwar Rizal wrote: > I do the same trick as Wendong to avoid compilation error of sbt > (excluding kafka_$(scala.binary.version) ) > > I still don't manage to make sbt pass scala.binary.version to maven. > > Anwar. > > On Mon, Jul 2

Re: Flink Kafka example in Scala

2015-07-20 Thread Anwar Rizal
I do the same trick as Wendong to avoid compilation error of sbt (excluding kafka_$(scala.binary.version) ) I still don't manage to make sbt pass scala.binary.version to maven. Anwar. On Mon, Jul 20, 2015 at 9:42 AM, Till Rohrmann wrote: > Hi Wendong, > > why do you exclude the kafka dependenc

Re: Flink Kafka example in Scala

2015-07-20 Thread Till Rohrmann
Hi Wendong, why do you exclude the kafka dependency from the `flink-connector-kafka`? Do you want to use your own kafka version? I'd recommend you to build a fat jar instead of trying to put the right dependencies in `/lib`. Here [1] you can see how to build a fat jar with sbt. Cheers, Till [1]

Re: Flink deadLetters

2015-07-20 Thread Till Rohrmann
Without the logs it is hard to say. On Sat, Jul 18, 2015 at 11:41 AM, Flavio Pompermaier wrote: > The job is quite simple..it just reads 10 parquet dirs, extract some infos > out of the thrift objects and generates Tuple3,make a project() and a > distinct() to call an external service only for s