[akka-user] Re: Akka-stream - aggregate record counts while writing to a sink, and update an object in the middle of the flow process with the aggregated data.

2016-11-09 Thread matheuslima
I guess you need to return a source built with the builder. Other problem 
is you build a ClosedGraph in ~> flow ~> sink 

Em terça-feira, 8 de novembro de 2016 22:49:31 UTC-3, Eugene Dzhurinsky 
escreveu:
>
> Got some time to experiment: with this definition
>
> val fetchLikesAndUpdateProfile = Flow[(OptProfile, 
> LikesAndCount)].flatMapConcat 
> {
>   case (Some(profile), (likesExpected, stream)) ⇒ GraphDSL.create() {
> implicit builder ⇒
>   import GraphDSL.Implicits._
>
>   val in = builder.add(Source(stream))
>   val profileLikesSink = builder.add(FileIO.toPath(new File(base, 
> s"likes_${profile.id}").toPath))
>   in ~> Flow.fromFunction[Profile.Profile, ByteString](p ⇒ 
> ByteString(s"${p.id}:${p.username}\n")) ~> profileLikesSink
>   // update profile here with the numbers of records and emit i
>   Source.single(profile).shape
>   }
> }
>
>
> The graph is teared at runtime:
>
>
> java.lang.IllegalArgumentException: requirement failed: The inlets [] and 
> outlets [single.out] must correspond to the inlets [] and outlets []
>  at scala.Predef$.require(Predef.scala:219)
>  at akka.stream.Shape.requireSamePortsAs(Shape.scala:168)
>  at akka.stream.impl.StreamLayout$CompositeModule.replaceShape(
> StreamLayout.scala:426)
>  at akka.stream.scaladsl.GraphApply$class.create(GraphApply.scala:19)
>  at akka.stream.scaladsl.GraphDSL$.create(Graph.scala:993)
>  at sample.Aggregate$$anonfun$6.apply(Aggregate.scala:112)
>  at sample.Aggregate$$anonfun$6.apply(Aggregate.scala:111)
>
>
> Looks like it doesn't like *Source.single(profile).shape *but the type of 
> the result should be *Graph[SourceShape[T],M]*, and I assume that the 
> SourceShape should actually emit the updated profile object. Can somebody 
> please explain how is it supposed to work, cuz I'm lost? 
>
> Thanks!
>

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


[akka-user] Re: Akka-stream - aggregate record counts while writing to a sink, and update an object in the middle of the flow process with the aggregated data.

2016-11-08 Thread Eugene Dzhurinsky
Got some time to experiment: with this definition

val fetchLikesAndUpdateProfile = Flow[(OptProfile, 
LikesAndCount)].flatMapConcat 
{
  case (Some(profile), (likesExpected, stream)) ⇒ GraphDSL.create() {
implicit builder ⇒
  import GraphDSL.Implicits._

  val in = builder.add(Source(stream))
  val profileLikesSink = builder.add(FileIO.toPath(new File(base, s
"likes_${profile.id}").toPath))
  in ~> Flow.fromFunction[Profile.Profile, ByteString](p ⇒ 
ByteString(s"${p.id}:${p.username}\n")) ~> profileLikesSink
  // update profile here with the numbers of records and emit i
  Source.single(profile).shape
  }
}


The graph is teared at runtime:


java.lang.IllegalArgumentException: requirement failed: The inlets [] and 
outlets [single.out] must correspond to the inlets [] and outlets []
 at scala.Predef$.require(Predef.scala:219)
 at akka.stream.Shape.requireSamePortsAs(Shape.scala:168)
 at akka.stream.impl.StreamLayout$CompositeModule.replaceShape(StreamLayout.
scala:426)
 at akka.stream.scaladsl.GraphApply$class.create(GraphApply.scala:19)
 at akka.stream.scaladsl.GraphDSL$.create(Graph.scala:993)
 at sample.Aggregate$$anonfun$6.apply(Aggregate.scala:112)
 at sample.Aggregate$$anonfun$6.apply(Aggregate.scala:111)


Looks like it doesn't like *Source.single(profile).shape *but the type of 
the result should be *Graph[SourceShape[T],M]*, and I assume that the 
SourceShape should actually emit the updated profile object. Can somebody 
please explain how is it supposed to work, cuz I'm lost? 

Thanks!

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


[akka-user] Re: Akka-stream - aggregate record counts while writing to a sink, and update an object in the middle of the flow process with the aggregated data.

2016-11-06 Thread matheuslimaufc
In fetchLikesAndUpdateProfile, you can broadcast the input to 
profileLikesSink and also to a flow the performs a fold to count all 
elements passed through and update the profile.

On Saturday, November 5, 2016 at 11:43:36 PM UTC-3, Eugene Dzhurinsky wrote:
>
> Okay, perhaps the simple source snippet will make it clear:
>
> type OptProfile = Option[FullProfile]
> type LikesAndCount = (Int, Stream[Profile])
>
> val src: Source[Int, NotUsed] = Source[Int](Conf.startId() to Conf.
> endId())
> val fetchProfileFlow: Flow[Int, OptProfile, NotUsed] = Flow.
> fromFunction(Profile.extractFullProfile)
> val fetchLikesFlow: Flow[Int, LikesAndCount, NotUsed] = Flow.
> fromFunction(Likes.extractUserList)
> val profileDataSink: Sink[ByteString, Future[IOResult]] = FileIO.
> toPath(new File(base, "profiles").toPath)
>
> val fetchLikesAndUpdateProfile = Flow[(OptProfile, 
> LikesAndCount)].flatMapConcat 
> {
>   case (Some(profile), (likesExpected, stream)) ⇒ GraphDSL.create() {
> implicit builder ⇒
>   import GraphDSL.Implicits._
>
>   val in = builder.add(Source(stream))
>   val profileLikesSink = builder.add(FileIO.toPath(new File(base, 
> s"likes_${profile.id}").toPath))
>   in ~> Flow.fromFunction[Profile.Profile, ByteString](p ⇒ 
> ByteString(s"${p.id}:${p.username}\n")) ~> profileLikesSink
>   // update profile here with the numbers of records and emit it
>   Source.single(profile).shape
>   }
> }
>
> RunnableGraph.fromGraph(
>   GraphDSL.create() {
> implicit builder ⇒
>   import GraphDSL.Implicits._
>
>   val inlet = builder.add(Broadcast[Int](2))
>   val merge = builder.add(Zip[OptProfile, LikesAndCount])
>
>   src ~> inlet.in
>   inlet.out(0) ~> fetchProfileFlow ~> merge.in0
>   inlet.out(1) ~> fetchLikesFlow ~> merge.in1
>   merge.out ~> fetchLikesAndUpdateProfile ~>
> Flow.fromFunction[Profile.FullProfile, ByteString](p ⇒ 
> ByteString(s"$p\n")) ~>
> profileDataSink
>   ClosedShape
>   }
> ).run()
>
>
> So far it is not clear how would I write *fetchLikesAndUpdateProfile* in 
> a way that it will
> - create a sink for storing list of fetched data (every profile has the 
> associated file named after the profile ID)
> - how to retrieve the number of stored records in 
> *fetchLikesAndUpdateProfile* and update the property in the *FullProfile*
>  object.
>
> Thanks!
>

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


[akka-user] Re: Akka-stream - aggregate record counts while writing to a sink, and update an object in the middle of the flow process with the aggregated data.

2016-11-05 Thread Eugene Dzhurinsky
Okay, perhaps the simple source snippet will make it clear:

type OptProfile = Option[FullProfile]
type LikesAndCount = (Int, Stream[Profile])

val src: Source[Int, NotUsed] = Source[Int](Conf.startId() to Conf.endId
())
val fetchProfileFlow: Flow[Int, OptProfile, NotUsed] = Flow.fromFunction
(Profile.extractFullProfile)
val fetchLikesFlow: Flow[Int, LikesAndCount, NotUsed] = Flow.
fromFunction(Likes.extractUserList)
val profileDataSink: Sink[ByteString, Future[IOResult]] = FileIO.toPath(
new File(base, "profiles").toPath)

val fetchLikesAndUpdateProfile = Flow[(OptProfile, 
LikesAndCount)].flatMapConcat 
{
  case (Some(profile), (likesExpected, stream)) ⇒ GraphDSL.create() {
implicit builder ⇒
  import GraphDSL.Implicits._

  val in = builder.add(Source(stream))
  val profileLikesSink = builder.add(FileIO.toPath(new File(base, s
"likes_${profile.id}").toPath))
  in ~> Flow.fromFunction[Profile.Profile, ByteString](p ⇒ 
ByteString(s"${p.id}:${p.username}\n")) ~> profileLikesSink
  // update profile here with the numbers of records and emit it
  Source.single(profile).shape
  }
}

RunnableGraph.fromGraph(
  GraphDSL.create() {
implicit builder ⇒
  import GraphDSL.Implicits._

  val inlet = builder.add(Broadcast[Int](2))
  val merge = builder.add(Zip[OptProfile, LikesAndCount])

  src ~> inlet.in
  inlet.out(0) ~> fetchProfileFlow ~> merge.in0
  inlet.out(1) ~> fetchLikesFlow ~> merge.in1
  merge.out ~> fetchLikesAndUpdateProfile ~>
Flow.fromFunction[Profile.FullProfile, ByteString](p ⇒ 
ByteString(s"$p\n")) ~>
profileDataSink
  ClosedShape
  }
).run()


So far it is not clear how would I write *fetchLikesAndUpdateProfile* in a 
way that it will
- create a sink for storing list of fetched data (every profile has the 
associated file named after the profile ID)
- how to retrieve the number of stored records in 
*fetchLikesAndUpdateProfile* and update the property in the *FullProfile*
 object.

Thanks!

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


[akka-user] Re: Akka-stream - aggregate record counts while writing to a sink, and update an object in the middle of the flow process with the aggregated data.

2016-11-05 Thread matheuslimaufc
If I understood, it's enough replace map(persistDoc) by via(persistDoc), if 
persitDoc is a flow. The fold stage is to aggregate the number of 
persistence operations performed by persistDoc stage. I understood you want 
to fetch a list of docs related to a profile, persist them and so update 
the profile if the number of docs persisted. This is the logic executed by 
the flatMapConcat stage I described previously.
Best regards.

On Saturday, November 5, 2016 at 9:41:55 AM UTC-3, Eugene Dzhurinsky wrote:
>
> So far I understood that *persistDoc* is another function that should 
> persist something into the appropriate file. However I believe that in my 
> case it is a flow, that has some source and sink attached, and the sink is 
> the *FileIO*.
>
> Basically, I don't see how that is supposed to work out - *FileIO* 
> doesn't return the number of the performed operations or something like 
> that.
>
> Thanks!
>

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


[akka-user] Re: Akka-stream - aggregate record counts while writing to a sink, and update an object in the middle of the flow process with the aggregated data.

2016-11-05 Thread Eugene Dzhurinsky
So far I understood that *persistDoc* is another function that should 
persist something into the appropriate file. However I believe that in my 
case it is a flow, that has some source and sink attached, and the sink is 
the *FileIO*.

Basically, I don't see how that is supposed to work out - *FileIO* doesn't 
return the number of the performed operations or something like that.

Thanks!

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


[akka-user] Re: Akka-stream - aggregate record counts while writing to a sink, and update an object in the middle of the flow process with the aggregated data.

2016-11-04 Thread matheuslimaufc
This can easily modeled as:
val source = Source(ids)
source
.mapAsync(4)(fetchDocumentById)
.map(_.profile)
.flatMapConcat{prof => 
   sourceOfRelatedDocs
.mapAsync(4)(persistDoc)
.fold(0)(_ ++ _ )
.map(count => (count, prof))
}
.mapAsync(4){case (count, prof) => updateProfile(count, prof)}
.to(Sink.ignore)

On Thursday, November 3, 2016 at 11:57:59 PM UTC-3, Eugene Dzhurinsky wrote:
>
> Hello, I want to implement the following workflow:
>
> - a source has the sequence of IDs to process
> - initial flow *INIT* fetches the document by ID and extracts the 
> *profile*
> - another flow *FETCH* is spawned, it fetches one or more of the 
> associated documents and store them in some sink *DATA (file)*
> - the flow must also calculate the number of records saved to the sink 
> *DATA*
> - once the flow *FETCH *is complete - the *profile* is updated with the 
> count of fetched documents
> - then the *profile* is written into the sink *METADATA (file)*
> - optionally if the number of records in *FETCH* phase doesn't match the 
> number of the actual number set in *metadata - *then it should write some 
> key into yet another sink *ERROR*
>
>
> 
>
> So far it's not clear how would I
>
> - aggregate the records produced by certain flow to calculate the number 
> of the records processed for certain input
> - keep the intermediate profile object somewhere until the records are not 
> fetched and saved in another flow
>
> Please advice.
>
> Thanks!
>

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


Re: [akka-user] Re: Akka-stream - aggregate record counts while writing to a sink, and update an object in the middle of the flow process with the aggregated data.

2016-11-04 Thread Eugene Dzhurinsky
On Friday, November 4, 2016 at 8:43:43 AM UTC-4, √ wrote:
>
> Why would it need its own materializer?
>

I have a stream of IDs, coming from the database.
There is the flow, that maps the ID into the profile ( fetch the details 
from some external storage ). So far it is simple enough - 

val profileFetcher = Flow.fromFunction[ID, Profile, NotUsed](id ⇒ )


Now I have to take the *profile* and run another flow, that will
- query 1 .. N different resources (depending on the content of Profile)
- transform the content of the resources and save them into a separate file 
(*sink*)
- aggregate results of that processing (for now - calculate the number of 
records fetched from external resources) and update the *profile* object
- stream down the *profile* object into another file (*sink)*.

So far I can see that the function, that goes downstream

val profileSaver = Flow.fromFunction[Profile, ByteString, NotUsed] = (profile 
⇒ )

must take the profile, create it's own flow, *wait until that flow 
completes* and then update the profile object before converting it into a 
*ByteString*.

And that's why I need to materialize the inner flow.


Makes sense?

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


Re: [akka-user] Re: Akka-stream - aggregate record counts while writing to a sink, and update an object in the middle of the flow process with the aggregated data.

2016-11-04 Thread Viktor Klang
Why would it need its own materializer?

On Fri, Nov 4, 2016 at 1:26 PM, Eugene Dzhurinsky 
wrote:

> More I think about is - more I am convinced that this is something that
> must be implemented via nested flows, like - the step to extract the
> initial user profile and then extract all associated records must be
> implemented as a separate Graph, that is invoked as part of transformation
> of the input stream.
>
> So I have a flow nested in another flow with it's own materializer - on
> every user ID from the outer flow I spawn another instance of inner flow
> and wait until it completes, and then send the data down to the appropriate
> sinks.
>
> Am I missing something?
>
> --
> >> Read the docs: http://akka.io/docs/
> >> Check the FAQ: http://doc.akka.io/docs/akka/c
> urrent/additional/faq.html
> >> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to akka-user+unsubscr...@googlegroups.com.
> To post to this group, send email to akka-user@googlegroups.com.
> Visit this group at https://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Cheers,
√

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


[akka-user] Re: Akka-stream - aggregate record counts while writing to a sink, and update an object in the middle of the flow process with the aggregated data.

2016-11-04 Thread Eugene Dzhurinsky
More I think about is - more I am convinced that this is something that 
must be implemented via nested flows, like - the step to extract the 
initial user profile and then extract all associated records must be 
implemented as a separate Graph, that is invoked as part of transformation 
of the input stream. 

So I have a flow nested in another flow with it's own materializer - on 
every user ID from the outer flow I spawn another instance of inner flow 
and wait until it completes, and then send the data down to the appropriate 
sinks.

Am I missing something?

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.