At 09:47 AM 14/03/02 -0800, srinivas mohan wrote:
>Now as the performance is  low..we wanted to redevelop
>our spider..in a language like c or perl...and use
>it with our existing product..
>
>I will be thankful if any one can help me choosing 
>the better language..where i can get better
>performance..

You'll never get better performance until you understand why you
had lousy performance before.  It's not obvious to me why Java
should get in the way.

I've written two very large robots and used perl both times.
There were two good reasons to choose perl:

- A robot fetches pages, analyzes them, and manages a database
  of been-processed and to-process.  The fetching involves no CPU.
  The database is probably the same in whatever language you use.
  THus the leftover computation is picking apart pages looking 
  for URLs and BASE values and so on... perl is hard to beat
  for that type of code.
- Time-to-market was criticial.  Using perl means you have to write
  much less code than in java or C or whatever, so you get done
  quicker.

It's not clear that you can write a robot to run faster than a
well-done perl one.  It is clear you can write one that's much
more maintainable, perl makes it too easy to write obfuscated code.
Another disadvantage of perl is the large memory footprint - since
a robot needs to be highly parallel, you probably can't afford to
have a perl process per execution thread.

Next time I might go with python.  Its regexp engine isn't quite 
as fast, but the maintainability is better.  -Tim


--
This message was sent by the Internet robots and spiders discussion list 
([EMAIL PROTECTED]).  For list server commands, send "help" in the body of a message 
to "[EMAIL PROTECTED]".

Reply via email to