At 09:47 AM 14/03/02 -0800, srinivas mohan wrote: >Now as the performance is low..we wanted to redevelop >our spider..in a language like c or perl...and use >it with our existing product.. > >I will be thankful if any one can help me choosing >the better language..where i can get better >performance..
You'll never get better performance until you understand why you had lousy performance before. It's not obvious to me why Java should get in the way. I've written two very large robots and used perl both times. There were two good reasons to choose perl: - A robot fetches pages, analyzes them, and manages a database of been-processed and to-process. The fetching involves no CPU. The database is probably the same in whatever language you use. THus the leftover computation is picking apart pages looking for URLs and BASE values and so on... perl is hard to beat for that type of code. - Time-to-market was criticial. Using perl means you have to write much less code than in java or C or whatever, so you get done quicker. It's not clear that you can write a robot to run faster than a well-done perl one. It is clear you can write one that's much more maintainable, perl makes it too easy to write obfuscated code. Another disadvantage of perl is the large memory footprint - since a robot needs to be highly parallel, you probably can't afford to have a perl process per execution thread. Next time I might go with python. Its regexp engine isn't quite as fast, but the maintainability is better. -Tim -- This message was sent by the Internet robots and spiders discussion list ([EMAIL PROTECTED]). For list server commands, send "help" in the body of a message to "[EMAIL PROTECTED]".
