Hi everybody, i need to execute a cogroup on sorted groups.
I explain it better: I have two datasets i.e. (key, value), I want to cogroup
on key and then the have both iterator sorted by value
how can i get it?
I know iterator should be collected to be sorted but i want to avoid it. what
happens
to be specific, the error occurs at:
org.apache.flink.*streaming.connectors.kafka.api.KafkaSource.initializeConnection*(KafkaSource.java:175)
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Kafka-cannot-find-org-I0Itec-zkclient-seriali
Here is my simple program to use Kafka:
import org.apache.kafka.clients.producer.{ProducerConfig, KafkaProducer,
ProducerRecord}
import org.apache.flink.streaming.api.environment._
import org.apache.flink.streaming.connectors.kafka
import org.apache.flink.streaming.connectors.kafka.api._
import or
Hi Till,
Thanks for your suggestion! I did a fat jar and the runtime error of
ClassNotFoundException was finally gone. I wish I had tried fat jar earlier
and it would have saved me 4 days.
Wendong
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble
I want use count() method in this link but when write
DataSet customers = getCustomerDataSet(env,mask,l,map);
Long i=customers.count();
not found count() in dataset why
https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/java/DataSet.html#count%28%29
--
View
when write this code display error
no interface expected here public static class MyCoGrouper extends
CoGroupFunction {
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
DataSet customers = getCustomerDataSet(env,mask,l,map);
DataSet orders= getOrdersDat
i try the following, but it take always maxIterations, maybe someone can
give me a review?
private int benchmarkCounter;
private static int iterationCounter = 1;
private static DataSet oldCentroids;
FlinkMain(int benchmarkCounter) {
this.benchmarkCounter = benchmarkCounte
You could do that but you might run into merge conflicts. Also keep in mind
that it is work in progress :)
On Mon, Jul 20, 2015 at 4:15 PM, Maximilian Alber <
alber.maximil...@gmail.com> wrote:
> Thanks!
>
> Ok, cool. If I would like to test it, I just need to merge those two pull
> requests into
BTW we should add an entry for this to the faq and point to the configuration
or faq entry in the exception message.
On 20 Jul 2015, at 15:15, Vasiliki Kalavri wrote:
> Hi Shivani,
>
> why are you using a vertex-centric iteration to compute the approximate
> Adamic-Adar?
> It's not an iterati
Thanks!
Ok, cool. If I would like to test it, I just need to merge those two pull
requests into my current branch?
Cheers,
Max
On Mon, Jul 20, 2015 at 4:02 PM, Maximilian Michels wrote:
> Now that makes more sense :) I thought by "nested iterations" you meant
> iterations in Flink that can be
Hi Shivani,
the Jaccard example is implemented in Giraph, and therefore uses iterations.
However, in Gelly we are not forced to do that for non-iterative
computations.
I see that there is some confusion with the implementation specifics.
Let me try to write down some skeleton code / detailed desc
But it will need to build BloomFilters for each vertex for each edge so idk
how efficient that would be.
On Mon, Jul 20, 2015 at 4:02 PM, Shivani Ghatge wrote:
> Hello Vasia,
>
> I will adapt the exact method for BloomFilter. (I think it can be done.
> Sorry. My mistake).
>
>
> On Mon, Jul 20, 2
Now that makes more sense :) I thought by "nested iterations" you meant
iterations in Flink that can be nested, i.e. starting an iteration inside
an iteration.
The caching/pinning of intermediate results is still a work in progress in
Flink. It is actually in a state where it could be merged but s
Hello Vasia,
I will adapt the exact method for BloomFilter. (I think it can be done.
Sorry. My mistake).
On Mon, Jul 20, 2015 at 3:45 PM, Shivani Ghatge wrote:
> Also the example of Jaccard that you had linked me to used VertexCentric
> configuration which I understand is because that api only
Also the example of Jaccard that you had linked me to used VertexCentric
configuration which I understand is because that api only uses
VertexCentricIteration for all the operations? But I think that is the best
way in order to know what neighbors belong to the BloomFilter?
On Mon, Jul 20, 2015 at
Hello Vasia,
As I had mentioned before, I need a BloomFilter as well as a HashSet for
the approximation to work. In the exact solution I am getting two HashSets
and comparing them. In approximate version, if we get two BloomFilters then
we have no way to compare the neighborhood sets.
I thought w
I believe there was some work in progress to reduce memory fragmentation
and solve similar problems.
Anyone knows what's happening with that?
On 20 July 2015 at 16:29, Andra Lungu wrote:
> I also questioned the vertex-centric approach before. The exact
> computation does not throw this exception
Oh sorry, my fault. When I wrote it, I had iterations in mind.
What I actually wanted to say, how "resuming from intermediate results"
will work with (non-nested) "non-Flink" iterations? And with iterations I
mean something like this:
while(...):
- change params
- submit to cluster
where the
I also questioned the vertex-centric approach before. The exact computation
does not throw this exception so I guess adapting the approximate version
will do the trick [I also suggested improving the algorithm to use less
operators offline].
However, the issue still persists. We saw it in Affinity
Hi Shivani,
why are you using a vertex-centric iteration to compute the approximate
Adamic-Adar?
It's not an iterative computation :)
In fact, it should be as complex (in terms of operators) as the exact
Adamic-Adar, only more efficient because of the different neighborhood
representation. Are yo
Hi Shivani,
The issue is that by the time the Hash Join is executed, the
MutableHashTable cannot allocate enough memory segments. That means that
your other operators are occupying them. It is fine that this also occurs
on Travis because the workers there have limited memory as well.
Till suggest
Hi,
I am afraid this is a known issue:
http://mail-archives.apache.org/mod_mbox/flink-dev/201503.mbox/%3CCAK5ODX7_-Wxg9pr7CkkkG4CzA+yNCNMvmea5L2i2iZZV=2c...@mail.gmail.com%3E
The behavior back then seems to be exactly what Shivani is experiencing at
the moment. At that point I remember Fabian sug
Hello Maximilian,
Thanks for the suggestion. I will use it to check the program. But when I
am creating a PR for the same implementation with a Test, I am getting the
same error even on Travis build. So for that what would be the solution?
Here is my PR https://github.com/apache/flink/pull/923
An
The taskmanager.memory.fraction you can also set from within the IDE by
giving the corresponding configuration object to the LocalEnvironment using
the setConfiguration method. However, the taskmanager.heap.mb is basically
the -Xmx value with which you start your JVM. Usually, you can set this in
y
Hi Shivani,
Flink doesn't have enough memory to perform a hash join. You need to
provide Flink with more memory. You can either increase the
"taskmanager.heap.mb" config variable or set "taskmanager.memory.fraction"
to some value greater than 0.7 and smaller then 1.0. The first config
variable all
Hello,
I am working on a problem which implements Adamic Adar Algorithm using
Gelly.
I am running into this exception for all the Joins (including the one that
are part of the reduceOnNeighbors function)
Too few memory segments provided. Hash Join needs at least 33 memory
segments.
The problem
"So it is up to debate how the support for resuming from intermediate
results will look like." -> What's the current state of that debate?
Since there is no support for nested iterations that I know of, the debate
how intermediate results are integrated has not started yet.
> "Intermediate resu
Use a broadcastset to distribute the old centers to a map which has the new
centers as regular input. Put the old centers in a hashmap in open() and
check the distance to the new centers in map().
On Jul 20, 2015 12:55 PM, "Pa Rö" wrote:
> okay, i have found it. how to compare my old and new cent
okay, i have found it. how to compare my old and new centers?
2015-07-20 12:16 GMT+02:00 Sachin Goel :
> Gah. Sorry.
> In the closeWith call, give a second argument which determines if the
> iteration should be stopped.
>
> -- Sachin Goel
> Computer Science, IIT Delhi
> m. +91-9871457685
> On Jul
Gah. Sorry.
In the closeWith call, give a second argument which determines if the
iteration should be stopped.
-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685
On Jul 20, 2015 3:21 PM, "Pa Rö" wrote:
> i not found the "iterateWithTermination" function, only "iterate" and
> "iterateDe
Thanks for the answer! But I need some clarification:
"So it is up to debate how the support for resuming from intermediate
results will look like." -> What's the current state of that debate?
"Intermediate results are not produced within the iterations cycles." ->
Ok, if there are none, what does
i not found the "iterateWithTermination" function, only "iterate" and
"iterateDelta". i use flink 0.9.0 with java.
2015-07-20 11:30 GMT+02:00 Sachin Goel :
> Hi
> You can use iterateWithTermination to terminate before max iterations. The
> feedback for iteration then would be (next solution, isCo
Hi
You can use iterateWithTermination to terminate before max iterations. The
feedback for iteration then would be (next solution, isConverged) where
isConverged is an empty data set if you wish to terminate.
However, this is something I have a pull request for:
https://github.com/apache/flink/pull
hello community,
i have write a k-means app in flink, now i want change my terminate
condition from max iteration to checking the changing of the cluster
centers, but i don't know how i can break the flink loop. here my execution
code of flink:
public void run() {
//load properties
Hi Max,
You are right, there is no support for nested iterations yet. As far as I
know, there are no concrete plans to add support for it. So it is up to
debate how the support for resuming from intermediate results will look
like. Intermediate results are not produced within the iterations cycles
For other issues (hadoop versions), we used a shell script that did a
search and replace on the variables.
Maybe you can do the same trick here...
On Mon, Jul 20, 2015 at 10:37 AM, Anwar Rizal wrote:
> Coz I don't like it :-)
>
> No, seriously, sure, I can do it with maven. It worked indeed wi
Coz I don't like it :-)
No, seriously, sure, I can do it with maven. It worked indeed with maven.
But the rest of our ecosystem uses sbt. That's why.
-Anwar
On Mon, Jul 20, 2015 at 10:28 AM, Till Rohrmann
wrote:
> Why not trying maven instead?
>
> On Mon, Jul 20, 2015 at 10:23 AM, Anwar Riz
Why not trying maven instead?
On Mon, Jul 20, 2015 at 10:23 AM, Anwar Rizal wrote:
> I do the same trick as Wendong to avoid compilation error of sbt
> (excluding kafka_$(scala.binary.version) )
>
> I still don't manage to make sbt pass scala.binary.version to maven.
>
> Anwar.
>
> On Mon, Jul 2
I do the same trick as Wendong to avoid compilation error of sbt (excluding
kafka_$(scala.binary.version) )
I still don't manage to make sbt pass scala.binary.version to maven.
Anwar.
On Mon, Jul 20, 2015 at 9:42 AM, Till Rohrmann wrote:
> Hi Wendong,
>
> why do you exclude the kafka dependenc
Hi Wendong,
why do you exclude the kafka dependency from the `flink-connector-kafka`?
Do you want to use your own kafka version?
I'd recommend you to build a fat jar instead of trying to put the right
dependencies in `/lib`. Here [1] you can see how to build a fat jar with
sbt.
Cheers,
Till
[1]
Without the logs it is hard to say.
On Sat, Jul 18, 2015 at 11:41 AM, Flavio Pompermaier
wrote:
> The job is quite simple..it just reads 10 parquet dirs, extract some infos
> out of the thrift objects and generates Tuple3,make a project() and a
> distinct() to call an external service only for s
41 matches
Mail list logo