Re: [akka-user] Line count of each file in a given directory efficiently using akka

2016-11-23 Thread kk k
Thanks Justin.Yes,this design is not for large number of files but for a 
very limited set.
It took sometime for me to get the actor model,but once I understood it I 
really liked it.So ,thanks for your points ,which makes me confident enough 
to believe that this current code is ok for my task at work.On "Premature 
optimization .. " ,yes I will validate and then improve if needed.Thanks 
again!


On Wednesday, November 23, 2016 at 10:09:21 PM UTC+5:30, Justin du coeur 
wrote:
>
> Your architecture sounds entirely reasonable as you describe it.  Whether 
> it's *optimal* really depends on the details of the problem: how many of 
> these files you're scanning simultaneously, for example.  It ought to work 
> fine for dozens of files.  It might start to bottleneck if there are 
> thousands -- but you're likely to encounter I/O bottlenecks first at that 
> point, I suspect.
>
> Basically, introducing multiple Aggregators, or routers, *might* be 
> helpful if you're trying to scale this thing really big -- or they might 
> just introduce complexity for no practical benefit.  It really isn't 
> obvious.
>
> The first rule of the game industry (which is often true elsewhere in 
> programming) is, "Premature optimization is the root of all evil".  Start 
> out by getting things working, and worry about optimization only after 
> measuring your hotspots...
>
> On Wed, Nov 23, 2016 at 6:09 AM, kk k  
> wrote:
>
>> Thanks for that.Yes,Streams might be right.
>> But I wanted to know if my current implementation is fine or can I tweak 
>> it further to improve based on few questions that I mentioned ?
>>
>> On Wednesday, November 23, 2016 at 4:24:42 PM UTC+5:30, √ wrote:
>>>
>>> Your use-case sounds like a perfect example of something which would 
>>> benefit quite a bit from being based on Akka Streams.
>>>
>>> On Wed, Nov 23, 2016 at 11:25 AM, kk k  wrote:
>>>

 This is my first program in akka so I wanted to know if the below 
 program is efficient and is using the advantages of actor model.

 --

 The program's purpose is to scan a given directory for any files and 
 print the number of lines in each file.

 1. The main `Application` class will create the actor system and send a 
 Scan message to a `FileScanner` actor.
 2. The `FileScanner` actor will scan the given directory, and for each 
 file it will create a new `FileParser` actor and send a Parse message. 
 Also, all the fileparser actors are passed the same Aggregator actor 
 Reference.
 3. The `FileParser` actor will parse the given file, and for each line 
 it will send a Line message to the Aggregator Actor.
 4. The `Aggregator` actor will maintain a count of the number of lines 
 for each file in an instance hashmap and will print the line count for 
 each 
 "End" message it receives. Once all files are processed, it will shutdown 
 the actor system.

 --

 A few points which I need help on:

 1. There is a separate `FileParser` actor for each file. Is this fine? 
 What is the benefit of using a router which routes to FileParser actor? 
 Will its use only help in controlling the number of fileparser actors and 
 also how load is distributed among these actors?

 2. There is a single Aggregator actor which counts the number of lines 
 for each file. It's using an instance `HashMap` and I hope this is fine. 
 Or 
 will a separate aggregator actor for each file improve performance?

 3. Also, I am passing the number of files to Aggregator actor while 
 it's created so that I can shutdown the actor system once all files are 
 processed. If I have a separate Aggregator for each file, I'm not sure how 
 to shutdown.

 4. Each file is only sequentially processed i.e a `FileParser` actor is 
 reading the file sequentially and and then invoking aggregator for each 
 line. Is this fine or can it be improved.

 --

 **Application**

 /**
 * The Application program bootstraps the actorsystem for parsing files 
 in a
 * given directory and finding their linecount
 * 
 * @author 
 * @version 1.0
 */
 public class Application {
 public void start(String directoryPath) {
 ActorSystem actorSystem = ActorSystem.create("logProcessor");
 ActorRef fileScanner = actorSystem.actorOf(
 Props.create(FileScanner.class), "fileScanner");
 fileScanner.tell(new Scan(directoryPath), ActorRef.noSender());
 }

 public static void main(String[] args) {

 if (args.length < 1) {
 System.out
 .println("Usage: java -jar log-process-1.0-SNAPSHOT.jar 
 ");
 System.exit(0);
 }
 String path = args[0];
 Application application = new Application();
 application.start(path);
 }
 }

 **FileScanner**

 /**

Re: [akka-user] Line count of each file in a given directory efficiently using akka

2016-11-23 Thread kk k

Mike,advise even if critical ,I would be more than happy to take it as its 
more oriented towards improving the code.
Thanks for the detailed explanation on the router front and other 
points,they helped me understand a lot of finer aspects.

The FSM's,I would explore more.Thanks again.

On Wednesday, November 23, 2016 at 6:49:50 PM UTC+5:30, Mike Nielsen wrote:
>
> While I'm far from an expert, and have never deployed Akka in production 
> (so take my advice with a grain of salt, and please don't hear this as any 
> criticism of what you've done) I use routers for a few of reasons:
>
>1. I can control the minimum and maximum numbers of them via 
>configuration, and there are some pool configurations that are sensitive 
> to 
>workload on the node, so you won't whack the compute node(s) too hard;
>2. You can configure worker instances to be run remotely, which means 
>that it's easier to scale horizontally if need be;
>3. In my view, it's worthwhile to have explicit management of actors 
>(as opposed to just spinning them up willy-nilly): I can foresee a risk in 
>writing a large system where execution contexts become a mixed-bag of 
>actors with a broad spectrum of performance characteristics.  Trolling 
>through the code later to try to tune performance by partitioning 
> execution 
>contexts is harder than having these designed up-front.  In your 
>application, this may not be such a big deal, but, for reasons of personal 
>preference, I try to avoid coding in a way that boxes me in scale-wise.  
>Admittedly it's more work, but my thinking is that while I'm learning, I 
>may as well learn it to the max.  Your mileage, of course, may vary.
>4. In a larger application, it also seems to me to be a risk that 
>business logic might become sprayed over actors in a way that makes the 
>code a pain to maintain.  I'm not sure I have a complete answer on 
> managing 
>that, but it seems that a central manifest of actors (in the form of 
> router 
>configurations) in your configuration files might be of benefit.
>
> In terms of whether it's OK to have a single actor per file, that seems to 
> be a tacit design decision that says I/O throughput is going to be the 
> factor that limits your application throughput.  If that's always going to 
> be the case, then fine, but if you are going to want to get a high degree 
> of scalability, that may not be a correct assumption: if you have massive 
> files and a distributed filesystem in which I/O throughput scales with the 
> number of compute nodes, then it may be better to work out how to partition 
> work on a single file over multiple nodes.
>
> As one beginner to another, I have found a couple of features of Akka that 
> are worthwhile:
>
>1. While not directly related to your questions, FSM's are amazingly 
>helpful in writing correct code, and therein may lie the answer to your 
>third query;
>2. I found streams a little tricker to get the hang of than vanilla 
>actors, but worth the investment of time.
>
> I hope you find these comments useful.
>
> On Wed, Nov 23, 2016 at 5:25 AM, kk k  
> wrote:
>
>>
>> This is my first program in akka so I wanted to know if the below program 
>> is efficient and is using the advantages of actor model.
>>
>> --
>>
>> The program's purpose is to scan a given directory for any files and 
>> print the number of lines in each file.
>>
>> 1. The main `Application` class will create the actor system and send a 
>> Scan message to a `FileScanner` actor.
>> 2. The `FileScanner` actor will scan the given directory, and for each 
>> file it will create a new `FileParser` actor and send a Parse message. 
>> Also, all the fileparser actors are passed the same Aggregator actor 
>> Reference.
>> 3. The `FileParser` actor will parse the given file, and for each line it 
>> will send a Line message to the Aggregator Actor.
>> 4. The `Aggregator` actor will maintain a count of the number of lines 
>> for each file in an instance hashmap and will print the line count for each 
>> "End" message it receives. Once all files are processed, it will shutdown 
>> the actor system.
>>
>> --
>>
>> A few points which I need help on:
>>
>> 1. There is a separate `FileParser` actor for each file. Is this fine? 
>> What is the benefit of using a router which routes to FileParser actor? 
>> Will its use only help in controlling the number of fileparser actors and 
>> also how load is distributed among these actors?
>>
>> 2. There is a single Aggregator actor which counts the number of lines 
>> for each file. It's using an instance `HashMap` and I hope this is fine. Or 
>> will a separate aggregator actor for each file improve performance?
>>
>> 3. Also, I am passing the number of files to Aggregator actor while it's 
>> created so that I can shutdown the actor system once all files are 
>> processed. If I have a separate 

Re: [akka-user] Line count of each file in a given directory efficiently using akka

2016-11-23 Thread Justin du coeur
Your architecture sounds entirely reasonable as you describe it.  Whether
it's *optimal* really depends on the details of the problem: how many of
these files you're scanning simultaneously, for example.  It ought to work
fine for dozens of files.  It might start to bottleneck if there are
thousands -- but you're likely to encounter I/O bottlenecks first at that
point, I suspect.

Basically, introducing multiple Aggregators, or routers, *might* be helpful
if you're trying to scale this thing really big -- or they might just
introduce complexity for no practical benefit.  It really isn't obvious.

The first rule of the game industry (which is often true elsewhere in
programming) is, "Premature optimization is the root of all evil".  Start
out by getting things working, and worry about optimization only after
measuring your hotspots...

On Wed, Nov 23, 2016 at 6:09 AM, kk k  wrote:

> Thanks for that.Yes,Streams might be right.
> But I wanted to know if my current implementation is fine or can I tweak
> it further to improve based on few questions that I mentioned ?
>
> On Wednesday, November 23, 2016 at 4:24:42 PM UTC+5:30, √ wrote:
>>
>> Your use-case sounds like a perfect example of something which would
>> benefit quite a bit from being based on Akka Streams.
>>
>> On Wed, Nov 23, 2016 at 11:25 AM, kk k  wrote:
>>
>>>
>>> This is my first program in akka so I wanted to know if the below
>>> program is efficient and is using the advantages of actor model.
>>>
>>> --
>>>
>>> The program's purpose is to scan a given directory for any files and
>>> print the number of lines in each file.
>>>
>>> 1. The main `Application` class will create the actor system and send a
>>> Scan message to a `FileScanner` actor.
>>> 2. The `FileScanner` actor will scan the given directory, and for each
>>> file it will create a new `FileParser` actor and send a Parse message.
>>> Also, all the fileparser actors are passed the same Aggregator actor
>>> Reference.
>>> 3. The `FileParser` actor will parse the given file, and for each line
>>> it will send a Line message to the Aggregator Actor.
>>> 4. The `Aggregator` actor will maintain a count of the number of lines
>>> for each file in an instance hashmap and will print the line count for each
>>> "End" message it receives. Once all files are processed, it will shutdown
>>> the actor system.
>>>
>>> --
>>>
>>> A few points which I need help on:
>>>
>>> 1. There is a separate `FileParser` actor for each file. Is this fine?
>>> What is the benefit of using a router which routes to FileParser actor?
>>> Will its use only help in controlling the number of fileparser actors and
>>> also how load is distributed among these actors?
>>>
>>> 2. There is a single Aggregator actor which counts the number of lines
>>> for each file. It's using an instance `HashMap` and I hope this is fine. Or
>>> will a separate aggregator actor for each file improve performance?
>>>
>>> 3. Also, I am passing the number of files to Aggregator actor while it's
>>> created so that I can shutdown the actor system once all files are
>>> processed. If I have a separate Aggregator for each file, I'm not sure how
>>> to shutdown.
>>>
>>> 4. Each file is only sequentially processed i.e a `FileParser` actor is
>>> reading the file sequentially and and then invoking aggregator for each
>>> line. Is this fine or can it be improved.
>>>
>>> --
>>>
>>> **Application**
>>>
>>> /**
>>> * The Application program bootstraps the actorsystem for parsing files
>>> in a
>>> * given directory and finding their linecount
>>> *
>>> * @author
>>> * @version 1.0
>>> */
>>> public class Application {
>>> public void start(String directoryPath) {
>>> ActorSystem actorSystem = ActorSystem.create("logProcessor");
>>> ActorRef fileScanner = actorSystem.actorOf(
>>> Props.create(FileScanner.class), "fileScanner");
>>> fileScanner.tell(new Scan(directoryPath), ActorRef.noSender());
>>> }
>>>
>>> public static void main(String[] args) {
>>>
>>> if (args.length < 1) {
>>> System.out
>>> .println("Usage: java -jar log-process-1.0-SNAPSHOT.jar
>>> ");
>>> System.exit(0);
>>> }
>>> String path = args[0];
>>> Application application = new Application();
>>> application.start(path);
>>> }
>>> }
>>>
>>> **FileScanner**
>>>
>>> /**
>>> * The FileScanner program scans for files in a given directory
>>> *
>>> * @author
>>> * @version 1.0
>>> */
>>> public class FileScanner extends UntypedActor {
>>>
>>> public FileScanner() {
>>> }
>>>
>>> /**
>>> * Invoked by the Actor System to scan a given directory
>>> *
>>> * @param message
>>> *The message to process
>>> */
>>> public void onReceive(Object message) {
>>> ActorRef parser;
>>> if (message instanceof Scan) {
>>> Scan scan = (Scan) message;
>>> System.out.println("Scan directory: " + scan.getDirectory());
>>>
>>> // Only top level files in the directory are read.No recursion is
>>> // done
>>> File directory = new 

Re: [akka-user] Line count of each file in a given directory efficiently using akka

2016-11-23 Thread kk k
Thanks for that.Yes,Streams might be right.
But I wanted to know if my current implementation is fine or can I tweak it 
further to improve based on few questions that I mentioned ?

On Wednesday, November 23, 2016 at 4:24:42 PM UTC+5:30, √ wrote:
>
> Your use-case sounds like a perfect example of something which would 
> benefit quite a bit from being based on Akka Streams.
>
> On Wed, Nov 23, 2016 at 11:25 AM, kk k  
> wrote:
>
>>
>> This is my first program in akka so I wanted to know if the below program 
>> is efficient and is using the advantages of actor model.
>>
>> --
>>
>> The program's purpose is to scan a given directory for any files and 
>> print the number of lines in each file.
>>
>> 1. The main `Application` class will create the actor system and send a 
>> Scan message to a `FileScanner` actor.
>> 2. The `FileScanner` actor will scan the given directory, and for each 
>> file it will create a new `FileParser` actor and send a Parse message. 
>> Also, all the fileparser actors are passed the same Aggregator actor 
>> Reference.
>> 3. The `FileParser` actor will parse the given file, and for each line it 
>> will send a Line message to the Aggregator Actor.
>> 4. The `Aggregator` actor will maintain a count of the number of lines 
>> for each file in an instance hashmap and will print the line count for each 
>> "End" message it receives. Once all files are processed, it will shutdown 
>> the actor system.
>>
>> --
>>
>> A few points which I need help on:
>>
>> 1. There is a separate `FileParser` actor for each file. Is this fine? 
>> What is the benefit of using a router which routes to FileParser actor? 
>> Will its use only help in controlling the number of fileparser actors and 
>> also how load is distributed among these actors?
>>
>> 2. There is a single Aggregator actor which counts the number of lines 
>> for each file. It's using an instance `HashMap` and I hope this is fine. Or 
>> will a separate aggregator actor for each file improve performance?
>>
>> 3. Also, I am passing the number of files to Aggregator actor while it's 
>> created so that I can shutdown the actor system once all files are 
>> processed. If I have a separate Aggregator for each file, I'm not sure how 
>> to shutdown.
>>
>> 4. Each file is only sequentially processed i.e a `FileParser` actor is 
>> reading the file sequentially and and then invoking aggregator for each 
>> line. Is this fine or can it be improved.
>>
>> --
>>
>> **Application**
>>
>> /**
>> * The Application program bootstraps the actorsystem for parsing files in 
>> a
>> * given directory and finding their linecount
>> * 
>> * @author 
>> * @version 1.0
>> */
>> public class Application {
>> public void start(String directoryPath) {
>> ActorSystem actorSystem = ActorSystem.create("logProcessor");
>> ActorRef fileScanner = actorSystem.actorOf(
>> Props.create(FileScanner.class), "fileScanner");
>> fileScanner.tell(new Scan(directoryPath), ActorRef.noSender());
>> }
>>
>> public static void main(String[] args) {
>>
>> if (args.length < 1) {
>> System.out
>> .println("Usage: java -jar log-process-1.0-SNAPSHOT.jar ");
>> System.exit(0);
>> }
>> String path = args[0];
>> Application application = new Application();
>> application.start(path);
>> }
>> }
>>
>> **FileScanner**
>>
>> /**
>> * The FileScanner program scans for files in a given directory
>> * 
>> * @author 
>> * @version 1.0
>> */
>> public class FileScanner extends UntypedActor {
>>
>> public FileScanner() {
>> }
>>
>> /**
>> * Invoked by the Actor System to scan a given directory
>> * 
>> * @param message
>> *The message to process
>> */
>> public void onReceive(Object message) {
>> ActorRef parser;
>> if (message instanceof Scan) {
>> Scan scan = (Scan) message;
>> System.out.println("Scan directory: " + scan.getDirectory());
>>
>> // Only top level files in the directory are read.No recursion is
>> // done
>> File directory = new File(scan.getDirectory());
>> // Incase of large number of files,we need to optimize below call.
>> File[] files = directory.listFiles();
>> // Required to shutdown actorsystem after all files are processed
>> int numberOfFiles = 0;
>>
>> /*
>> * To only count the files and ignore any folders
>> */
>> for (File file : files) {
>> if (file.isFile())
>> numberOfFiles++;
>> }
>> ActorRef aggregator = getContext()
>> .actorOf(Props.create(Aggregator.class, numberOfFiles),
>> "aggregator");
>> File file;
>> for (int i = 0; i < files.length; i++) {
>> file = files[i];
>> if (!file.isFile())
>> continue;
>> System.out.println(file.getName());
>>
>> /*
>> * Use a unique identifier(counter) for actor names as file
>> * names can have special characters(ex:readme (copy).md) and
>> * hence cannot be directly used as actor names
>> * 
>> * Docs:Actor paths MUST: not start with `$`, // include only
>> * ASCII letters and can only contain these special //
>> * characters: -_.*$+:@&=,!~';.
>> */
>> parser = 

Re: [akka-user] Line count of each file in a given directory efficiently using akka

2016-11-23 Thread Mike Nielsen
While I'm far from an expert, and have never deployed Akka in production
(so take my advice with a grain of salt, and please don't hear this as any
criticism of what you've done) I use routers for a few of reasons:

   1. I can control the minimum and maximum numbers of them via
   configuration, and there are some pool configurations that are sensitive to
   workload on the node, so you won't whack the compute node(s) too hard;
   2. You can configure worker instances to be run remotely, which means
   that it's easier to scale horizontally if need be;
   3. In my view, it's worthwhile to have explicit management of actors (as
   opposed to just spinning them up willy-nilly): I can foresee a risk in
   writing a large system where execution contexts become a mixed-bag of
   actors with a broad spectrum of performance characteristics.  Trolling
   through the code later to try to tune performance by partitioning execution
   contexts is harder than having these designed up-front.  In your
   application, this may not be such a big deal, but, for reasons of personal
   preference, I try to avoid coding in a way that boxes me in scale-wise.
   Admittedly it's more work, but my thinking is that while I'm learning, I
   may as well learn it to the max.  Your mileage, of course, may vary.
   4. In a larger application, it also seems to me to be a risk that
   business logic might become sprayed over actors in a way that makes the
   code a pain to maintain.  I'm not sure I have a complete answer on managing
   that, but it seems that a central manifest of actors (in the form of router
   configurations) in your configuration files might be of benefit.

In terms of whether it's OK to have a single actor per file, that seems to
be a tacit design decision that says I/O throughput is going to be the
factor that limits your application throughput.  If that's always going to
be the case, then fine, but if you are going to want to get a high degree
of scalability, that may not be a correct assumption: if you have massive
files and a distributed filesystem in which I/O throughput scales with the
number of compute nodes, then it may be better to work out how to partition
work on a single file over multiple nodes.

As one beginner to another, I have found a couple of features of Akka that
are worthwhile:

   1. While not directly related to your questions, FSM's are amazingly
   helpful in writing correct code, and therein may lie the answer to your
   third query;
   2. I found streams a little tricker to get the hang of than vanilla
   actors, but worth the investment of time.

I hope you find these comments useful.

On Wed, Nov 23, 2016 at 5:25 AM, kk k  wrote:

>
> This is my first program in akka so I wanted to know if the below program
> is efficient and is using the advantages of actor model.
>
> --
>
> The program's purpose is to scan a given directory for any files and print
> the number of lines in each file.
>
> 1. The main `Application` class will create the actor system and send a
> Scan message to a `FileScanner` actor.
> 2. The `FileScanner` actor will scan the given directory, and for each
> file it will create a new `FileParser` actor and send a Parse message.
> Also, all the fileparser actors are passed the same Aggregator actor
> Reference.
> 3. The `FileParser` actor will parse the given file, and for each line it
> will send a Line message to the Aggregator Actor.
> 4. The `Aggregator` actor will maintain a count of the number of lines for
> each file in an instance hashmap and will print the line count for each
> "End" message it receives. Once all files are processed, it will shutdown
> the actor system.
>
> --
>
> A few points which I need help on:
>
> 1. There is a separate `FileParser` actor for each file. Is this fine?
> What is the benefit of using a router which routes to FileParser actor?
> Will its use only help in controlling the number of fileparser actors and
> also how load is distributed among these actors?
>
> 2. There is a single Aggregator actor which counts the number of lines for
> each file. It's using an instance `HashMap` and I hope this is fine. Or
> will a separate aggregator actor for each file improve performance?
>
> 3. Also, I am passing the number of files to Aggregator actor while it's
> created so that I can shutdown the actor system once all files are
> processed. If I have a separate Aggregator for each file, I'm not sure how
> to shutdown.
>
> 4. Each file is only sequentially processed i.e a `FileParser` actor is
> reading the file sequentially and and then invoking aggregator for each
> line. Is this fine or can it be improved.
>
> --
>
> **Application**
>
> /**
> * The Application program bootstraps the actorsystem for parsing files in a
> * given directory and finding their linecount
> *
> * @author
> * @version 1.0
> */
> public class Application {
> public void start(String directoryPath) {
> ActorSystem 

Re: [akka-user] Line count of each file in a given directory efficiently using akka

2016-11-23 Thread Viktor Klang
Your use-case sounds like a perfect example of something which would
benefit quite a bit from being based on Akka Streams.

On Wed, Nov 23, 2016 at 11:25 AM, kk k  wrote:

>
> This is my first program in akka so I wanted to know if the below program
> is efficient and is using the advantages of actor model.
>
> --
>
> The program's purpose is to scan a given directory for any files and print
> the number of lines in each file.
>
> 1. The main `Application` class will create the actor system and send a
> Scan message to a `FileScanner` actor.
> 2. The `FileScanner` actor will scan the given directory, and for each
> file it will create a new `FileParser` actor and send a Parse message.
> Also, all the fileparser actors are passed the same Aggregator actor
> Reference.
> 3. The `FileParser` actor will parse the given file, and for each line it
> will send a Line message to the Aggregator Actor.
> 4. The `Aggregator` actor will maintain a count of the number of lines for
> each file in an instance hashmap and will print the line count for each
> "End" message it receives. Once all files are processed, it will shutdown
> the actor system.
>
> --
>
> A few points which I need help on:
>
> 1. There is a separate `FileParser` actor for each file. Is this fine?
> What is the benefit of using a router which routes to FileParser actor?
> Will its use only help in controlling the number of fileparser actors and
> also how load is distributed among these actors?
>
> 2. There is a single Aggregator actor which counts the number of lines for
> each file. It's using an instance `HashMap` and I hope this is fine. Or
> will a separate aggregator actor for each file improve performance?
>
> 3. Also, I am passing the number of files to Aggregator actor while it's
> created so that I can shutdown the actor system once all files are
> processed. If I have a separate Aggregator for each file, I'm not sure how
> to shutdown.
>
> 4. Each file is only sequentially processed i.e a `FileParser` actor is
> reading the file sequentially and and then invoking aggregator for each
> line. Is this fine or can it be improved.
>
> --
>
> **Application**
>
> /**
> * The Application program bootstraps the actorsystem for parsing files in a
> * given directory and finding their linecount
> *
> * @author
> * @version 1.0
> */
> public class Application {
> public void start(String directoryPath) {
> ActorSystem actorSystem = ActorSystem.create("logProcessor");
> ActorRef fileScanner = actorSystem.actorOf(
> Props.create(FileScanner.class), "fileScanner");
> fileScanner.tell(new Scan(directoryPath), ActorRef.noSender());
> }
>
> public static void main(String[] args) {
>
> if (args.length < 1) {
> System.out
> .println("Usage: java -jar log-process-1.0-SNAPSHOT.jar ");
> System.exit(0);
> }
> String path = args[0];
> Application application = new Application();
> application.start(path);
> }
> }
>
> **FileScanner**
>
> /**
> * The FileScanner program scans for files in a given directory
> *
> * @author
> * @version 1.0
> */
> public class FileScanner extends UntypedActor {
>
> public FileScanner() {
> }
>
> /**
> * Invoked by the Actor System to scan a given directory
> *
> * @param message
> *The message to process
> */
> public void onReceive(Object message) {
> ActorRef parser;
> if (message instanceof Scan) {
> Scan scan = (Scan) message;
> System.out.println("Scan directory: " + scan.getDirectory());
>
> // Only top level files in the directory are read.No recursion is
> // done
> File directory = new File(scan.getDirectory());
> // Incase of large number of files,we need to optimize below call.
> File[] files = directory.listFiles();
> // Required to shutdown actorsystem after all files are processed
> int numberOfFiles = 0;
>
> /*
> * To only count the files and ignore any folders
> */
> for (File file : files) {
> if (file.isFile())
> numberOfFiles++;
> }
> ActorRef aggregator = getContext()
> .actorOf(Props.create(Aggregator.class, numberOfFiles),
> "aggregator");
> File file;
> for (int i = 0; i < files.length; i++) {
> file = files[i];
> if (!file.isFile())
> continue;
> System.out.println(file.getName());
>
> /*
> * Use a unique identifier(counter) for actor names as file
> * names can have special characters(ex:readme (copy).md) and
> * hence cannot be directly used as actor names
> *
> * Docs:Actor paths MUST: not start with `$`, // include only
> * ASCII letters and can only contain these special //
> * characters: -_.*$+:@&=,!~';.
> */
> parser = getContext().actorOf(
> Props.create(FileParser.class, aggregator),
> "parser-" + i);
> parser.tell(new Parse(file.getAbsolutePath()), getSelf());
> }
> } else {
> unhandled(message);
> }
> }
> }
>
> **FileParser**
>
> /**
> * The FileScanner program scans for files in a given directory
> *
> * @author
> * @version 1.0
> */
> public class FileParser extends UntypedActor {
>
> /**
> * An aggregator actor reference to send file 

[akka-user] Line count of each file in a given directory efficiently using akka

2016-11-23 Thread kk k

This is my first program in akka so I wanted to know if the below program 
is efficient and is using the advantages of actor model.

--

The program's purpose is to scan a given directory for any files and print 
the number of lines in each file.

1. The main `Application` class will create the actor system and send a 
Scan message to a `FileScanner` actor.
2. The `FileScanner` actor will scan the given directory, and for each file 
it will create a new `FileParser` actor and send a Parse message. Also, all 
the fileparser actors are passed the same Aggregator actor Reference.
3. The `FileParser` actor will parse the given file, and for each line it 
will send a Line message to the Aggregator Actor.
4. The `Aggregator` actor will maintain a count of the number of lines for 
each file in an instance hashmap and will print the line count for each 
"End" message it receives. Once all files are processed, it will shutdown 
the actor system.

--

A few points which I need help on:

1. There is a separate `FileParser` actor for each file. Is this fine? What 
is the benefit of using a router which routes to FileParser actor? Will its 
use only help in controlling the number of fileparser actors and also how 
load is distributed among these actors?

2. There is a single Aggregator actor which counts the number of lines for 
each file. It's using an instance `HashMap` and I hope this is fine. Or 
will a separate aggregator actor for each file improve performance?

3. Also, I am passing the number of files to Aggregator actor while it's 
created so that I can shutdown the actor system once all files are 
processed. If I have a separate Aggregator for each file, I'm not sure how 
to shutdown.

4. Each file is only sequentially processed i.e a `FileParser` actor is 
reading the file sequentially and and then invoking aggregator for each 
line. Is this fine or can it be improved.

--

**Application**

/**
* The Application program bootstraps the actorsystem for parsing files in a
* given directory and finding their linecount
* 
* @author 
* @version 1.0
*/
public class Application {
public void start(String directoryPath) {
ActorSystem actorSystem = ActorSystem.create("logProcessor");
ActorRef fileScanner = actorSystem.actorOf(
Props.create(FileScanner.class), "fileScanner");
fileScanner.tell(new Scan(directoryPath), ActorRef.noSender());
}

public static void main(String[] args) {

if (args.length < 1) {
System.out
.println("Usage: java -jar log-process-1.0-SNAPSHOT.jar ");
System.exit(0);
}
String path = args[0];
Application application = new Application();
application.start(path);
}
}

**FileScanner**

/**
* The FileScanner program scans for files in a given directory
* 
* @author 
* @version 1.0
*/
public class FileScanner extends UntypedActor {

public FileScanner() {
}

/**
* Invoked by the Actor System to scan a given directory
* 
* @param message
*The message to process
*/
public void onReceive(Object message) {
ActorRef parser;
if (message instanceof Scan) {
Scan scan = (Scan) message;
System.out.println("Scan directory: " + scan.getDirectory());

// Only top level files in the directory are read.No recursion is
// done
File directory = new File(scan.getDirectory());
// Incase of large number of files,we need to optimize below call.
File[] files = directory.listFiles();
// Required to shutdown actorsystem after all files are processed
int numberOfFiles = 0;

/*
* To only count the files and ignore any folders
*/
for (File file : files) {
if (file.isFile())
numberOfFiles++;
}
ActorRef aggregator = getContext()
.actorOf(Props.create(Aggregator.class, numberOfFiles),
"aggregator");
File file;
for (int i = 0; i < files.length; i++) {
file = files[i];
if (!file.isFile())
continue;
System.out.println(file.getName());

/*
* Use a unique identifier(counter) for actor names as file
* names can have special characters(ex:readme (copy).md) and
* hence cannot be directly used as actor names
* 
* Docs:Actor paths MUST: not start with `$`, // include only
* ASCII letters and can only contain these special //
* characters: -_.*$+:@&=,!~';.
*/
parser = getContext().actorOf(
Props.create(FileParser.class, aggregator),
"parser-" + i);
parser.tell(new Parse(file.getAbsolutePath()), getSelf());
}
} else {
unhandled(message);
}
}
}

**FileParser**

/**
* The FileScanner program scans for files in a given directory
* 
* @author 
* @version 1.0
*/
public class FileParser extends UntypedActor {

/**
* An aggregator actor reference to send file events to.
*/
private ActorRef aggregator;

public FileParser(ActorRef aggregator) {
this.aggregator = aggregator;
}

/**
* Invoked by the mailbox when it receives a thread timeslice and a message
* is available to it from FileScanner.It reads only text files and any
* other files are not handled
* 
* @param message
*The message to process
*/
public void onReceive(Object message) {
if (message instanceof Parse) {
Parse parseMessage = (Parse)