[Boston.pm] software puzzle - extracting longest alphabetical list of phrases from a list of words
The following is just a problem in computer science. It is not directly related to Perl, or to my work. I am looking for insights in how to think about this. The input: a list of words. The output: a partitioning of the input list into a longest list of phrases, such that the phrases are in alphabetical order. (Each phrase is one of more consecutive words, and a word is a maximum length sequence of non-space characters.) The following example shows that maximizing the number of phrases may not produce the answer a person would, but it makes the problem solvable by an algorithm that does not have a set of allowed phrases. If there are two or more lists of the same length assume any one will do as the answer. Example input 1: atta boy catch as catch can Example output 1: atta boy catch as catch can I presume this problem is already known to software engineering. What is its name? (For example, other problems are solved by connected components, or topological sort, etc.) Here are a few things I know about solving this problem: It has complexity at most O(2^n) because there are at most 2^n partitions. A brute force algorithm would start with the case of having n partitions, where each word is its own phrase. If this is in alphabetical order we are done. Otherwise try all the cases where there are n-1 partitions, then n-2 partitions, etc. (This algorithm would probably be OK for lists with a reasonable number of words. I cannot estimate the maximum number of word or phrases it could handle on a PC.) Is there a deterministic algorithm in a lower complexity class? I would be happy with a heuristic approach that did pretty well. One possible score for determining whether to start a phrase at a word has two components: 1. Its position is the list. A lower position is better, because we want many phrases. This is easily precomputed by a O(n) pass over the list. 2. Its alphabetical order in the list. Again a lower number is better. This can be computed one time in advance by an n log(n) sort. Then maybe something like alpha-beta pruning (a la chess) could be used to evaluate the best position to introduce a phrase. Once we have a phrase starting with some string $s then all words $w to the right of $s such that $w le $s cannot start a phrase. The first phrase always starts with the first word. So we can immediately mark words alphabetically lt this as not able to start a phrase, e.g. as in the example above is lt atta. A heuristic approach might take advantage of this. Is there a greedy approach (one that never backtracks) that emits a reasonable output? P.S. In case anyone is interested in actually writing code to solve this, the alphabetical order is case insensitive. The origin of the problem was doing a View Source on a web page that had a large drop down list, and wanting to reconstruct the list of phrases. P.P.S. Congratulations to Ronald. I predict that in 17 or 18 years he will be helping (or nagging) Tobias with getting his college application material done. The time does fly by. Steve Tolkin ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] merging lists that are ordered but not sorted
I am replying to myself to thank all the Perl mongers who replied with help. Indeed, my problem is topological sort, as stated by Alex Vandiver and Gyepi SAM. I did not see that because the input format is different from that required by the Unix tsort program. A search for: tsort perl power tools found this http://search.cpan.org/src/CWEST/ppt-0.14/html/commands/tsort/index.html which leads directly to the perl code I used this to solve the problem. Note the strange name -- tcsort not tsort. (Perhaps in homage to Tom Christiansen, the prime mover of the very useful but moribund ppt project.) I earlier found tsort.exe (port to Windows) inside coreutils-5.3.0-bin.zip at http://sourceforge.net/project/showfiles.php?group_id=23617package_id=1 42775 Unfortunately this tsort.exe depends on libintl3.dll which was not in the *.zip file and which I could not find anywhere. Aside: Does anyone know where I can get a libintl3.dll ? Both versions of tsort require the 2 values on each input row be separated by one space. Fortunately I was able to transform my data into this format. Major kudos to Ben Tilly! He wrote from scratch a perl program that solved the problem. (Since he put in the effort to write this I took some extra time to test it. It produced the same output as tsort, because the lists overlapped enough to overcome the fact that the output order is in general not deterministic.) I think the problem statement I gave was clear enough. Any cycle in the input is an error. The tsort program in perl simply reports cycle detected without any information as to which elements are on the cycle. My use was not related to alignment of DNA. It was part of a personal mashup to combine data about cars that I scraped from e.g. http://autos.yahoo.com/toyota_camry_se_v6-specs/?p=all The actual values in the list are strings such as these: Cylinders Horsepower @ RPM Fuel Economy Cty/Hwy As another aside, if people are interested I can send 77 lines of data for each of these 2008 model year cars: Camry, Accord, Infiniti_G35, Impreza, Altima, Audi_A4, Volvo_S40, Saab_9_3 I would not mind off-list opinions on any of these cars. In general I want a car with width = 70.7 inches (the Accord at 71.7 is probably too wide to fit in my garage), and would like AWD. Thanks, Steve -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tolkin, Steve Sent: Tuesday, January 29, 2008 12:12 PM To: Boston Perl Mongers Subject: [Boston.pm] merging lists that are ordered but not sorted I am looking for a perl program that will solve the following problem. Suppose I have 2 or more lists that are (conceptually) sublists of the same underlying list. I want to reconstruct the underlying list. In other words the order of the elements agrees in all the lists, but there is no sort condition. Example: List 1: dog, cat, mouse List 2: dog, shark, mouse, elephant There are 2 possible outputs, and I do not care which one I get. The reason that I have not just coded this up is that it seems it require an unbounded amount of look ahead. Also, when there are more than 2 lists, I think I need to read from all of them before making a decision about which element can be safely output. Thanks, Steve -- Steven Tolkin[EMAIL PROTECTED] 508-787-9006 Fidelity Investments 400 Puritan Way M3B Marlborough MA 01752 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
[Boston.pm] merging lists that are ordered but not sorted
I am looking for a perl program that will solve the following problem. Suppose I have 2 or more lists that are (conceptually) sublists of the same underlying list. I want to reconstruct the underlying list. In other words the order of the elements agrees in all the lists, but there is no sort condition. Example: List 1: dog, cat, mouse List 2: dog, shark, mouse, elephant There are 2 possible outputs, and I do not care which one I get. The reason that I have not just coded this up is that it seems it require an unbounded amount of look ahead. Also, when there are more than 2 lists, I think I need to read from all of them before making a decision about which element can be safely output. Thanks, Steve -- Steven Tolkin[EMAIL PROTECTED] 508-787-9006 Fidelity Investments 400 Puritan Way M3B Marlborough MA 01752 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
[Boston.pm] which very large US bank uses Perl for their integration strategy
In the middle of the long list of replies to a posting about why ESBs are bad (and REST is good) at http://steve.vinoski.net/blog/2007/10/04/the-esb-question/ I find this reply: 30. John Davies says: October 7th, 2007 at 6:42 pm ... your best option is shell scripts (awk, grep, cut, tail etc.) and PERL, one very large US bank famously implemented their entire integration strategy on this just a few years ago and it's already out-lived a good half-dozen Java based efforts since. Does anyone have the details about this? He says famously, but I am not aware of even which bank it is. Thanks, Steve -- Steven Tolkin[EMAIL PROTECTED] 508-787-9006 Fidelity Investments 400 Puritan Way M3B Marlborough MA 01752 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] merge and compare help
This can be easily extended to be a general purpose match/merge program. Suppose we call the two inputs A and B. Each ID is in one of three possible cases, and so we want three subroutines, named e.g., just_in_a, just_in_b, and in_both. (In the original example just_in_a would do the same thing as just_in_b, but that is not always desired.) I am looking for perl code that does this, in a configurable way, e.g. let the user specify the ID column/s, sort the two inputs (if not already sorted), read them both, call the subs, etc. Please send a link or the code itself. thanks, Steve -- Steven TolkinSteve-d0t-Tolkin-at-fmr-d0t-com 508-787-9006 Fidelity Investments 400 Puritan Way M3B Marlborough MA 01752 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of John Macdonald Sent: Monday, August 27, 2007 4:01 PM To: Alex Brelsfoard Cc: boston-pm@mail.pm.org Subject: Re: [Boston.pm] merge and compare help Your solution is the right one. The final trick is to make sure you keep going with one file after the other file reaches the end. I usually have the file read routine return a fake record for EOF, that has a key guaranteed to be higher than any real key. (That requires knowing what the keys look like, but it will often be something like \255\255\255\255.) The merge subroutine checks for that EOF key and exits. If a merge is done for a different key, then neither file can be at EOF. If a record is written without needing a merge, then that file at least is not at EOF. This trick gets rid of a lot of code that checks whether either or both files are at EOF when you are deciding whether to read from a file, and comparing the current records. On Mon, Aug 27, 2007 at 02:04:57PM -0400, Alex Brelsfoard wrote: Hi All, I'm back and with a new algorithm/solution I need help with. I have two csv files, sorted by the first column (ID). Each file may have all the same, none of the same, or some of the same ID's. I would like to take these two files, and make one out of them. Two tricks: - When I come across the same ID in each file I need to merge those two lines (don't worry about the merge, I can handle that). - I want to be looking at the least number of lines from each file as possible at any one time (optimally I would like to only be looking at one of each file at the same time). Basically we are dealing with large files here and I don't want to kill my RAM by storing all the data from both files into a hash or some other object. I have an algorithm I like, I'm just not certain how to implement it: 1. Examine the ID of the first line of each file. 2. If they are the same, then merge and print the merge to the final output file.. 3. If they are not the same, find the lesser one and have it print its contents to the final output file until its ID is the same or greater than the other file's. 4. repeat. Any advice on how to do this? Thanks. --Alex ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Extract text from html preserving newlines
Thanks Jerrad, I actually tried lynx first. However, the html files are on a server that needs authentication. Even adding -auth my-user-id:my-pw To lynx was not enough. Here is the lynx output (I added the # as these are comments in the perl program): # Looking up [my proxy] # Making HTTP connection to [my proxy] # Sending HTTP request. # HTTP request sent; waiting for response. # Alert!: Invalid header 'WWW-Authenticate: NTLM' # Alert!: Can't retry with authorization! Contact the server's WebMaster. # Can't Access [the url I wanted] # Alert!: Unable to access document. # # lynx: Can't access startfile I am not sure what I really need to do. I looked at the headers using Mozilla Firefox add-on and decided that generating the proper values for WWW-Authenticate was too complex for lynx, and for Mechanize too. But maybe I am missing something. Steve -Original Message- From: Jerrad Pierce [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 02, 2007 1:45 PM To: Tolkin, Steve Cc: Boston Perl Mongers Subject: Re: [Boston.pm] Extract text from html preserving newlines lynx -dump -- Free map of local environmental resources: http://CambridgeMA.GreenMap.org -- MOTD on Boomtime, the 49th of Discord, in the YOLD 3173: It is useless for sheep to pass resolutions in favor of vegetarianism while wolves remain of a different opinion. ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Extract text from html preserving newlines
That worked. Thanks! Running lynx on my local copies of the *.html files works reasonably well, although the output is not what IE produces, and is harder for me to parse. A minor follow up question. Currently I have to run lynx from its own directory. Otherwise I got \lynx_w32\lynx.bat foo.htm LINES value must be = 2: got 1 initscr(): LINES=1 COLS=1: too small. Is there a way to set up lynx to let me run it from elsewhere? Steve Tolkin VP, Architecture FESCo Architecture Strategy Group Fidelity Employer Services Company 400 Puritan Way M3B Marlborough MA 01752 508-787-9006 [EMAIL PROTECTED] The information in this email and subsequent attachments may contain confidential information that is intended solely for the attention and use of the named addressee(s). This message or any part thereof must not be disclosed, copied, distributed or retained by any person without authorization from Fidelity Investments. -Original Message- From: Chris Devers [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 02, 2007 1:53 PM To: Tolkin, Steve Cc: Boston Perl Mongers Subject: Re: [Boston.pm] Extract text from html preserving newlines On Wed, 2 May 2007, Tolkin, Steve wrote: Q1. Is there a way to automate IE or Mozilla Firefox to save 100's of files as text? Probably, but might it be easier to automate using `lynx -dump` (or better still, `links -dump`) ? If those produce output the way you want them, automating it should be a snap to do, even with just a simple shell script. $ for f in *.html; do links -dump $f ${f}.txt; done Etc. -- Chris Devers DO NOT LEAVE IT IS NOT REAL ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Program wanted to recover text that has spaces inserted or deleted
As you suggest the easiest way is to just ignore all the blanks, and then try to find words, probably by a greedy approach, and then backing off. However, the original email explained that extra spaces are much more likely than missing spaces. This information could be used to get better results. Thanks to Richard Barbalace for sending his program. I can run it, and now I need to look at how to revise it. Thanks, Steve -Original Message- From: Chris Devers [mailto:[EMAIL PROTECTED] Sent: Thursday, April 05, 2007 10:44 PM To: Tolkin, Steve Cc: boston perl mongers Subject: Re: [Boston.pm] Program wanted to recover text that has spaces inserted or deleted On Apr 5, 2007, at 6:42 PM, Tolkin, Steve wrote: Also, this is somewhat more complicated because sometimes spaces can be removed, although occasionally with much lower frequency. For example Arti factrefers ought to be Artifact refers How is the program supposed to select from variants such as Artifact refers Art I fact refers documents and document sand ? It almost seems like you can't trust the spaces at all, so you might as well just throw them all out and then look for valid word chains in the remaining text. If nothing else, that would also solve the ancillary problem of a space before punctuation marks... -- Chris Devers ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
[Boston.pm] Program wanted to recover text that has spaces inserted or deleted
I am looking for a program that can recover the original text from text that has spaces inserted or deleted. Ideally in perl of course. The following text has many places where an extra space is inserted. Given a dictionary it would be possible to reconstruct the original text, with only a few errors remaining. I probably could write a program like that, but I suspect this has been done before. Also, this is somewhat more complicated because sometimes spaces can be removed, although occasionally with much lower frequency. For example Arti factrefers ought to be Artifact refers. Arti factrefers t o an appl i cat i on-l evel uni t of i nformat i on t hat i s subj ect t o anal ysi s by some appl i cat i on. Exampl es i ncl ude a t ext document , a segment of speech or vi deo, a col l ect i on of document s and a st ream of any of t he above. Other notes: One source of errors might be proper nouns, but a sophisticated program could improve its handling of these, if it kept in memory the fragments seen. Nice to have the space before a comma etc. removed. Thanks, Steve ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] emma's pizza
Irish coffee contains all four required food groups: Sugar, fat, caffeine, and alcohol P.S. Obligatory comment about Perl -- the Chilean pianist Alfredo Perl has recorded all of Beethoven's sonatas, and much else, and I recommend them. Hopefully helpfully yours, Steve -- Steve Tolkin Steve . Tolkin at FMR dot COM 508-787-9006 Fidelity Investments 82 Devonshire St. M3L Boston MA 02109 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Update to job posting policy?
I think the current policy is fine as is. Location is just one of many factors to be discussed before applying for, or accepting, a job. If this is a pressing concern the applicant should ask about it in an early phone call. Other people will care more about salary, benefits, the nature of the work, etc. Steve -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ronald J Kimball Sent: Tuesday, December 05, 2006 11:20 AM To: Boston Perl Mongers Subject: [Boston.pm] Update to job posting policy? I received an off-list comment from a Perl monger, in response to the recent job posting for a Senior Perl Developer in Waltham, MA. The monger had spent some time talking with the recruiter, only to learn that the location was too far from a commuter rail station to be worthwhile. The monger suggested that job postings from recruiters not be allowed unless the employer's address is clearly stated. Personally, I am not inclined to make this change to our policy. I think that not identifying the employer is a reasonable position for recruiters to take, to protect their business, even though it can be frustrating for potential applicants. I am worried that recruiters might choose not to send job postings to our list at all. I thought of an alternate suggestion, which is that job postings without a street address must indicate accessibility to mass transit. Our job posting policy is a result of a consensus reached on the list a few years ago, so I decided to open this up to the whole list for comments. Would you all like to see either of these suggested changes made? Other feedback on the policy is also welcome. Here is our current job posting policy: --- Job postings may not be posted directly to the list. Instead, job postings should be sent to [EMAIL PROTECTED] I will review each posting, and either post it to the list or return it to the sender for editing. When I send a job posting to the list, the Subject header will include the string [job]. Guidelines for job postings: 1. Perl must be a primary aspect of the job. 2. The job must be located in the greater Boston area. 3. The following information should be included in the job posting: Required skill-set Contract or permanent position? Pay range, for contract positions Incentives, for permanent positions Placement through a recruiter, or directly with the company? Location, and whether telecommuting is available Company's product or service --- Ronald ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] teaching kids Perl
Perl has at least one advantage over other languages -- it is easy to see variables, because they start with a dollar sign (or other sigil). In my brief experience teaching programming to children this has proven to be helpful, because getting the difference between a variable and a string is important. Hopefully helpfully yours, Steve -- Steve Tolkin Steve . Tolkin at FMR dot COM 508-787-9006 Fidelity Investments 82 Devonshire St. M3L Boston MA 02109 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kate Wood Sent: Friday, December 01, 2006 10:30 AM To: Boston Perl Mongers Subject: [Boston.pm] teaching kids Perl Hi all, So... say you were going to teach a child (or several children) of about ten, reasonable technical aptitude, to program using Perl. How would you go about it? I'm doing some lessons for my daughter and her friends for the spring,and need some further input.They're not quite of an age where handing them the camel book and saying go for it is realistic, but they're pretty self-motivated. Kate ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Python
Dear Ben, Bob et al., Thanks for this thread. (It has a very high signal to noise ratio, compared with many others.) Dear Everyone, Since this started about Python, in a Perl discussion list, I am wondering about whether Perl facilitate the kind of experimentation that led to stackless Python. http://www.stackless.com/ An experimental implementation that supports continuations, generators, microthreads, and coroutines. See also http://www.onlamp.com/pub/a/python/2000/10/04/stackless-intro.html Perhaps not, because this will be built into Perl 6. Perhaps not, because the Python community is different than the Perl community in some fundamental way, e.g., there is only one version of Perl. Perhaps not, because Continuations are a Bad Thing. I believe some disciplined way of doing concurrency is clearly needed, and I do not think any of our current abstractions are good enough. (They may work in theory, but not in practice, e.g. they are too hard to reason about, or to debug.) I can think of no better path than for Perl to get this right, and run well on the multi-core CPU systems of the future. Steve [rest of thread snipped] ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Loop index in foreach?
Are you serious? $.., $..., $ etc?! Aii!!! he screams and runs away. Please stop this thread. Hopefully helpfully yours, Steve -- Steve Tolkin Steve . Tolkin at FMR dot COM 508-787-9006 Fidelity Investments 82 Devonshire St. M3L Boston MA 02109 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Duane Bronson Sent: Thursday, September 21, 2006 7:03 PM To: Ronald J Kimball Cc: boston-pm@mail.pm.org; Palit, Nilanjan Subject: Re: [Boston.pm] Loop index in foreach? $.. should be the iterator count in the parent loop, $... should be the iterator count in the grandparent loop, ... my @fruits = ('apple','banana','cantaloupe'); foreach my $fruit (@fruits) { foreach my $minusone (0..1000) { foreach my $plusone (2..1000) { die inner loop count wrong unless $plusone == $.+1; die outer loop count wrong unless $minusone == $..-1; die way outer loop index wrong unless $fruit eq $fruits[$...]; } } Ronald J Kimball wrote: On Thu, Sep 21, 2006 at 09:34:43AM -0700, Palit, Nilanjan wrote: I think it'd be fairly easy for Perl to auto initialize increment a loop index in all loops provide that to the user in a special variable. $. is an excellent example. I think it'd be a great addition to Perl's excellent ( long) list of special vars, making for yet more elegant concise code. What would you have Perl do in the case of nested loops? Ronald ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm -- Sincerely *Duane Bronson* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] http://www.nerdlogic.com/ 453 Washington St. #4A, Boston, MA 02111 617.515.2909 ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Short time in Boston
Two in Cambridge are well worth seeing (especially for people who live here! :) MIT Museum -- great permanent collection on robots, MIT hacks, holograms, mechanical sculptures by Arthur Ganson, and usually also a variable show. http://web.mit.edu/museum/ Harvard Museum -- the world famous (and deservedly so) glass flowers. Hopefully helpfully yours, Steve -- Steve Tolkin Steve . Tolkin at FMR dot COM 508-787-9006 Fidelity Investments 82 Devonshire St. M3L Boston MA 02109 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of David H. Adler Sent: Friday, September 15, 2006 1:01 AM To: boston-pm@mail.pm.org Subject: Re: [Boston.pm] Short time in Boston On Thu, Sep 14, 2006 at 06:42:48PM -0400, Uri Guttman wrote: JA == John Abreau [EMAIL PROTECTED] writes: JA David H. Adler wrote: So. Mom and I are taking a cruise next month up the east coast and into Canada. We've got a day (22 Oct, if I've got this all right) in Boston. What should we do in the... 10 hours we're there? i assume that is a day stop here? what hours? Yep. I believe we dock at 8am and set sail (motor?) at 6pm. [snip suggestions] another possible idea is an emergency pm social lunch. This, of course, is a definite possiblity. dha -- David H. Adler - [EMAIL PROTECTED] - http://www.panix.com/~dha/ It's about hoodwinking the viewer in the cheapest and easiest manner possible- Markku Pätilä ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
[Boston.pm] Is there any security issue with *.pmc files?
I read Audrey's Tang blog and some things it linked to. Great stuff. I learned that *.pmc files have precedence over *.pm files. Does this introduce a security issue, i.e. anything new beyond the existing risks? I wonder if an evil *.pmc file might not even be noticed when searching for a problem, due to its unusual extension. Specifically, can the *.pmc file be in a different directory than the *.pm file that was intended to be used? Hopefully helpfully yours, Steve -- Steve Tolkin Steve . Tolkin at FMR dot COM 508-787-9006 Fidelity Investments 82 Devonshire St. M3L Boston MA 02109 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
[Boston.pm] Perl and utf16 e.g. for Windows Registry file
Summary: How to use Perl 5.8.0 to handle files encoded using utf-16 on Windows? Details: I have read that perl 5.8 ought to handle utf-16 without me needing to tell it anything. But I am now getting the behavior I expect. Specifically, I want to find what changed in a Registry after I install a program. So I export the whole Windows Registry to a *.txt file. This file is written using utf16 (technically utf-le because Intel in little endian). Then I install the program, and export the Registry again to a second file. These files are very large, over 100 MB. So the port of diff.exe to Windows quickly dies, saying diff: memory exhausted I then tried diff.pl (which uses diff.pm) and watched the memory usage slowly grow to over 100 MB; I never got any output. So I decided to reduce the number of lines in the file by removing all the binary data (which in the text file is plain text, matching this pattern: ^\d{8} However the following command line perl program fails, in that it emits every input line to the output. I suspect this problem is caused by the fact that the file is UTF16. perl -ne print if ! m/^\d{8}/ reg1.txt reg1_reduced.txt Note: \d is equivalent to [0-9] -- using that failed also. I then tried to include the NUL bytes and used this perl -ne print if ! m/^[0-9\000]{8}/ reg1.txt reg1_reduced.txt But that somehow caused the new lines to disappear. So I am asking for help. Thanks, Steve ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Put similarities in code and differences in data
I understand Uri's point, and can almost understand the silliness, but I think there really is more often a benefit to putting similarities in code and differences in data rather than vice versa. The following quote makes a similar point, but it is not exactly the same point. Eric S. Raymond, The Art of Unix Programming p 47 online at http://www.faqs.org/docs/artu/ch01s06.html and many other places Rule of Representation: Fold knowledge into data, so program logic can be stupid and robust. Even the simplest procedural logic is hard for humans to verify, but quite complex data structures are fairly easy to model and reason about. ... Data is more tractable than program logic. It follows that where you see a choice between complexity in data structures and complexity in code, choose the former. More: in evolving a design, you should actively seek ways to shift complexity from code to data. Another related idea is this: To reuse code you have to change the data (my paraphrase of a quote in http://groups.google.com/group/comp.object/browse_frm/thread/2ebcb9c6cf8 6bf9f/318ede5cf4a01220?tvc=1q=%22in+data%22+%22in+code%22+invariant+OR+ invariants+OR+mellorhl=en#318ede5cf4a01220 The difference is that I am trying to find a quote that focuses on the benefits of using data in a special way, as control data, to determine the specific execution path taken by the code. Thanks, Steve -Original Message- Tolkin, Steve wrote: I am looking for the best and/or original wording of this programming maxim: Put similarities in code and differences in data Google found this in a perl discussion capture similarities in code, differences in data http://blog.gmane.org/gmane.comp.lang.perl.fun/month=20031001 So I am posting to this list. Here is a hit on a similar quote putting invariants in code and differences in data. http://groups.google.com/group/comp.object/browse_thread/thread/1dc6f6dd db34dc18/cdfb5eae936861f2?lnk=stq=%22differences+in+data%22+%22in+code% 22rnum=3hl=en#cdfb5eae936861f2 This mentions Mellor is passing -- Is he the original person behind this? Hopefully helpfully yours, Steve ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Put similarities in code and differences in data
Thank you Charlie. That is the idea I am trying to get across. Do you have any suggestions about how to get developers to see the benefits of writing programs this way? Any specific books, techniques, etc.? Any pitfalls to be aware of? Thanks, Steve -- Steve TolkinSteve . Tolkin at FMR dot COM508-787-9006 Fidelity Investments 82 Devonshire St. M3L Boston MA 02109 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. Steve -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Charlie Reitzel Sent: Tuesday, April 04, 2006 9:18 AM To: boston-pm@mail.pm.org Subject: Re: [Boston.pm] Put similarities in code and differences in data Not really. I believe it is intended to mean data driven programming as Jeremy mentioned earlier. To me, data driven programming means use lotsa lookup tables, the contents of which are user tweakable. As simple as it sounds, it can be an effective technique to let you quickly adapt a system as requirements evolve - without code changes. Having found this hammer early in my programming career, I find a great many nails. Early days in any new design are spent setting up a lookup table table, along with utility routines for reporting, validation, UI picking values (one or several), etc. It may be a use case, but I don't think this is quite the same thing as the subject of this thread which, as Uri says, is a general approach to analysis. At 09:00 AM 4/4/2006 -0400, [EMAIL PROTECTED] wrote: hi ( 06.04.04 08:46 -0400 ) Tolkin, Steve: The difference is that I am trying to find a quote that focuses on the benefits of using data in a special way, as control data, to determine the specific execution path taken by the code. um, isn't this the scientific method? -- \js oblique strategy: how would you have done it? ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
[Boston.pm] Changing compiler from VC98 to Visual C++ Toolkit 2003
I used to have use Visual C compiler from 1998, aka VC98. I have compiled some Perl XS modules with it. When I got a new PC it did not have that old compiler on it. I copied my Perl directories over, and they seem to work. I just downloaded the free (as in beer) Visual C++ Toolkit 2003 from Microsoft. Q1. Can I use it to compile new XS modules without problems? Q2. Should I recompile all the existing XS modules? perl -v This is perl, v5.8.7 built for MSWin32-x86-multi-thread (with 7 registered patches, see perl -V for more detail) Copyright 1987-2005, Larry Wall Binary build 813 [148120] provided by ActiveState http://www.ActiveState.com ActiveState is a division of Sophos. Built Jun 6 2005 13:36:37 Thanks, Steve ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] script to normalize output of Windows dir command
Ben Tilly asked: Are you reinventing the rsync wheel? No. I actually use the freeware version program syncback at http://www.2brightsparks.com/downloads.html to do backup, and I think it uses rsync (or similar) internally. But I do not just want to do a full restore. I want to see what will be happening first. I think I can run syncback in a quiet mode that shows what would happen, but not actually do it. I still want to be able to see the differences between (portions of) two file systems, based on various criteria, including date, size, directory, etc. Steve -Original Message- From: Ben Tilly [mailto:[EMAIL PROTECTED] Sent: Friday, September 23, 2005 5:52 PM To: Tolkin, Steve Cc: Jeremy Muhlich; boston-pm@mail.pm.org Subject: Re: [Boston.pm] script to normalize output of Windows dir command On 9/23/05, Tolkin, Steve [EMAIL PROTECTED] wrote: I do have a port of Unix find on my current Windows machine. But I do not have that on the machine I back up to (my wife's), so I would need to install that, and its dependencies, which makes me reluctant to take that approach. Are you reinventing the rsync wheel? (Yeah, I know. Getting the flags right can be a pain.) Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Quotes and such [was] RE: script to normalize output of Windows dir command
Actually, the original poster (me) very trying to solve a different problem. I clearly specified that what was wanted is a perl program to convert the output of the Windows dir command into a structured text format suitable for use with sort and/or loading into a database. This would let me see what will be impacted by a partial restore. It also has the benefit of not needing anything installed on my wife's machine (which is the target of the backup) -- not rsync, not find, not even perl. Having a canonical format for file information also allows comparison with the list produced by many other programs, e.g. ls, find, Sequoia View, Wilbur, any other backup program, etc. My suggested format was: Path|file|extension|Dir_or_File|bytes|date|time e.g. C:\_from_laptop\AAA BBB_files|empty.jpg|txt|Dir|0|2003-04-14|23:00 So the natural sort works as desired and it is also easy to be a timestamp based sort. I continue to think about that original problem. I realize that I should probably force bytes to 0 if type is Dir. The program probably should have an option to change between slash and backslash, and possibly suppress the drive letter. I might actually write this program one day; if I do I'll post it here for feedback. P.S. I tried to find a version of rsync for Windows that does not require cygwin. Is there one? Hopefully helpfully yours, Steve -- Steve TolkinSteve . Tolkin at FMR dot COM 617-563-0516 Fidelity Investments 82 Devonshire St. V13CBoston MA 02109 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. -Original Message- From: Chris Devers [mailto:[EMAIL PROTECTED] Sent: Monday, September 26, 2005 2:16 PM To: John Macdonald Cc: boston-pm@mail.pm.org; Ricker, William; Tolkin, Steve Subject: Re: [Boston.pm] Quotes and such [was] RE: script to normalize output of Windows dir command On Mon, 26 Sep 2005, John Macdonald wrote: On Mon, Sep 26, 2005 at 12:48:08PM -0400, Ricker, William wrote: Chris Devers was however obviously looking for this rather specific elaboration of Santayana's, as it captures the inevitableness. [ Any sufficiently complicated c or fortran program contains an ad hoc informally- [ specified, bug-ridden, slow implementation of half of Common Lisp. [ -Greenspun's 10th law of programming [ http://philip.greenspun.com/bboard/q-and-a-fetch-msg?msg_id=000tgU Note - there are no laws (1..9). Actually, I think he was looking for Henry Spencer's old quote: Those who do not understand Unix are doomed to reinvent it - badly. Either of those, actually :-) AS I say, I'm sure there's some witty nugget of a reformulation of those lines based around this thread and rsync -- the Unix variant is nice and succinct, while the Lisp one gets more specific -- but I can't be bothered to tease it out. In any case, the point stands: the original poster was looking for a way to solve a problem in Perl that rsync already has tackled. Perl is a nice tool and suitable for many purposes, but there are limits beyond which even the roundest of reinvented wheels can get no rounder, and rsync is clearly the roundest wheel for this job :-) -- Chris Devers ÝSB½ÚF5†{Dp ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
[Boston.pm] script to normalize output of Windows dir command
Summary: I would like a perl script that converts the output of the Windows dir command so that each line has the same format, including the directory it is in, and its extension. The date and time should use a format that can be sorted as a string, e.g. -mm-dd and a 24 hour clock I think pipe delimited would work best, as the pipe character | cannot appear in a file name, and that would let me sort the output, and/or load it into a database. Details: I could probably write this in an hour but laziness is a virtue, and if someone has got one already that will probably be better anyway. I want to translate lines like this: Directory of C:\_from_laptop\AAA BBB_files 04/14/2003 10:21 AM 123 abc 04/14/2003 11:00 PM 0 empty.jpg.txt To lines something like this. Note that I moved the file name and extension sooner, so that the natural sort is by directory and file name, and a sort on the last two fields is by time. (I have a port of Unix sort in my c:\bin\ directory that I can use.) C:\_from_laptop\AAA BBB_files|abc||File|123|2003-04-14|10:21 C:\_from_laptop\AAA BBB_files|empty.jpg|txt|Dir|0|2003-04-14|23:00 None of it is tricky. You just need to remember what Directory line you saw last, convert the date and time fields, insert either File or Dir depending on its type, and write out each line that comes from a file or dir (except skip all the . and .. dirs). Note that a file named foo.bar.txt has a name of foo.bar and extension of txt. Some files can have no extension, and some directories do have an extension. Here is an except of the output. (Because it is an except the totals for Files and Bytes are not right.) Note that there are a few lines of boilerplate at the beginning which can be ignored, and a few lines at the end which can be ignored (or used as a sanity check on the totals.) Note that a file might not have an extension, that a file or directory can be empty, can have white space and strange characters in its name. Volume in drive C has no label. Volume Serial Number is A898-B50D Directory of C:\_from_laptop 01/23/2005 08:37 AMDIR . 01/23/2005 08:37 AMDIR .. 04/14/2003 01:46 PMDIR _from_c 02/06/2001 01:34 PM 15618 0101.txt 02/06/2001 01:34 PM 15618 abc 04/14/2003 10:22 AM 32451 AAA BBB.htm 01/17/2005 09:53 AMDIR AAA BBB_files 04/04/2000 06:14 PM 27648 acm_pubform.doc 01/17/2005 09:53 AMDIR acrobat 01/17/2005 09:54 AMDIR address 08/17/2004 10:04 AM 0 zzz 650 File(s) 92010877 bytes Directory of C:\_from_laptop\AAA BBB_files 01/17/2005 09:53 AMDIR . 01/17/2005 09:53 AMDIR .. 04/14/2003 10:21 AM 1045 abc 04/14/2003 10:21 AM 0 empty.jpg.txt 04/14/2003 10:22 AM 32451 AAA BBB CCC.htm 01/17/2005 09:53 AMDIR AAA BBB_CCC_files 04/14/2003 10:21 AM43 spacer.gif 11 File(s) 37476 bytes Directory of C:\_from_laptop\AAA BBB CCC_files 01/17/2005 09:53 AMDIR . 01/17/2005 09:53 AMDIR .. 0 File(s) 0 bytes Total Files Listed: 245909 File(s)28969650933 bytes 154376 Dir(s) 31272304640 bytes free Background: My laptop's died a few days ago. The process to recover files and directories from it seems to have lots of missing files. I have a directory on another machine that I have been backing up to. I want to find out which file are missing. I have run dir on the backed up machine, and will run dir on the new machine, and then diff the outputs. The diff will work best if each line in the file had the same format, and includes the full directory path. P.S. Here is the command I ran in a DOS box (aka command prompt window etc.) from my Windows XP machine. dir dir.txt c:\_from_laptop /-C /ON /S /TW /4 The /-C means suppress the thousand separator in the size, /ON means order by name, /S means recurse into subdirectories, /TW means show the last time it was written, and /4 means show 4 digit years. Thanks, Steve ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] script to normalize output of Windows dir command
I do have a port of Unix find on my current Windows machine. But I do not have that on the machine I back up to (my wife's), so I would need to install that, and its dependencies, which makes me reluctant to take that approach. I, like many people, have had problems with find, but I thought I would try your suggestion. There are quirks with the time reporting, and probably other issues I have forgotten. I do not know exactly how to set the argument to -printf and it is not explained in the help (shown below). If you send an example I would try that. Here are a few lines from the output of \bin\find -print -ls 945730 drwxr-xr-x 6 a071046 Administ0 Sep 21 15:05 ./ant ./ant/bin 951240 drwxr-xr-x 2 a071046 Administ0 Sep 21 15:05 ./ant/bin ./ant/bin/ant 951283 -rwxr-xr-x 1 a071046 Administ 5140 Apr 16 2003 ./ant/bin/ant ./ant/bin/ant.bat Note each file is on two lines. Probably that is the default for -ls. Also date and time are combined into three fields, but the third is either time or year. This makes it harder to process. I would actually prefer time in seconds since the start of the Unix eon. Also there is no easy way to distinguish Files from Directories except by further parsing of the permissions string, e.g. drwxr-xr-x. Here is the help. I cannot figure out how to suppress certain useless fields e.g. inode and owner, nor put output on one line, etc. C:\foo\bin\find -help Usage: /bin/find [path...] [expression] default path is the current directory; default expression is -print expression may consist of: operators (decreasing precedence; -and is implicit where no others are given): ( EXPR ) ! EXPR -not EXPR EXPR1 -a EXPR2 EXPR1 -and EXPR2 EXPR1 -o EXPR2 EXPR1 -or EXPR2 EXPR1 , EXPR2 options (always true): -daystart -depth -follow --help -maxdepth LEVELS -mindepth LEVELS -mount -noleaf --version -xdev tests (N can be +N or -N or N): -amin N -anewer FILE -atime N -cmin N -cnewer FILE -ctime N -empty -false -fstype TYPE -gid N -group NAME -ilname PATTERN -iname PATTERN -inum N -ipath PATTERN -iregex PATTERN -links N -lname PATTERN -mmin N -mtime N -name PATTERN -newer FILE -nouser -nogroup -path PATTERN -perm [+-]MODE -regex PATTERN -size N[bckw] -true -type [bcdpfls] -uid N -used N -user NAME -xtype [bcdpfls] actions: -exec COMMAND ; -fprint FILE -fprint0 FILE -fprintf FILE FORMAT -ok COMMAND ; -print -print0 -printf FORMAT -prune -ls Thanks for the suggestion, but it is probably faster to write the perl that use find. Steve -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeremy Muhlich Sent: Friday, September 23, 2005 12:19 PM To: boston-pm@mail.pm.org Subject: Re: [Boston.pm] script to normalize output of Windows dir command How about the unix find command, with the -printf option? You can get it through cygwin. Taking find's output (even without -printf) from two directories and diffing it has gotten me through most of these sorts of problems. Also, diff -r might be helpful. (possibly with the --brief option as well) -- Jeremy On Fri, 2005-09-23 at 11:55 -0400, Tolkin, Steve wrote: Summary: I would like a perl script that converts the output of the Windows dir command so that each line has the same format, including the directory C:\_from_laptop\AAA BBB_files|abc||File|123|2003-04-14|10:21 C:\_from_laptop\AAA BBB_files|empty.jpg|txt|Dir|0|2003-04-14|23:00 ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Combining the nodes reachable in n steps from a web page into one printable file
Summary: 1. What might cause IO::Socket::INET-new to fail? 2. Is there a bundle for WWW-Mechanize? Details: I went to http://search.cpan.org/dist/WWW-Mechanize/ and read the doc and it looks promising. I downloaded the tar.gz file, extracted all its files, and started the usual install process. Unfortunately I hit a variety of problems. Here is the output: C:\perl_install\WWW-Mechanize-1.14perl makefile.pl It seems that you are not directly connected to the Internet. Some of the WWW::Mechanize tests interact with websites such as Google, in addition to its own internal tests. Do you want to skip these tests? [y] y Do you want to install the mech-dump utility? [y] y It looks like you don't have SSL capability (like IO::Socket::SSL) installed. You will not be able to process https:// URLs correctly. WWW::Mechanize likes to have a lot of test modules for some of its tests. The following are modules that would be nice to have, but not required. Test::Pod Test::Memory::Cycle Test::Warn Checking if your kit is complete... Looks good Warning: prerequisite LWP::UserAgent 2.024 not found. We have 1.004. Warning: prerequisite Test::LongString 0 not found. Warning: prerequisite URI 1.25 not found. We have 1.19. Writing Makefile for WWW::Mechanize // I *am* directly connected to the Internet, so the first warning is probably caused by a proxy problem. Looking inside the Makefile.PL I think the specific test that failed is: if ( !$skiplive ) { require IO::Socket; my $s = IO::Socket::INET-new( PeerAddr = www.google.com:80, Timeout = 10, ); I think my proxy is set up correctly. C:\perl_install\WWW-Mechanize-1.14env | grep -i proxy FTP_PROXY=http://proxbos1.fmr.com:8000 HTTP_PROXY=http://proxbos1.fmr.com:8000 How can I learn more about why IO::Socket::INET-new failed? The others errors are dependencies on other modules, or newer versions of modules. Is there a bundle for WWW-Mechanize? Thanks, Steve -Original Message- From: Ricker, William Sent: Wednesday, September 14, 2005 5:05 PM To: Tolkin, Steve; L-boston-pm Subject: RE: [Boston.pm] Combining the nodes reachable in n steps from a web page into one printable file Is this to implement the missing PRINTABLE PAGE button for just yourself or as part of the website? This sounds a lot like one of the examples in MDJ's new Higher Order Perl book. Outside of HOP, WWW::Mechanize is the new wrapper around LWP::Simple for this sort of thing. Makes my old LWP-wielding cache-and-smash implementation look lumpy ... Bill ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
[Boston.pm] Combining the nodes reachable in n steps from a web page into one printable file
This seems like a problem that would be easily solved with a small perl script. Many web pages have a large list of links. I would like to follow all the links, to some small depth (typically just 1) and put their output into one file, in some format suitable for printing. I am flexible about the order of the links, and the details of the format, etc. This has probably been written already. Having it in perl would let me modify it, which might be useful. (If there is a reliable freeware or shareware program, I would also be interested in that.) Thanks, Steve P.S. perl -v says: This is perl, v5.8.0 built for MSWin32-x86-multi-thread (with 1 registered patch, see perl -V for more detail) Copyright 1987-2002, Larry Wall Binary build 805 provided by ActiveState Corp. http://www.ActiveState.com Built 18:08:02 Feb 4 2003 ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Geo::Coder::US RE: GoogleGeoCoder
This is a true story. About 15 years ago I moved to Newton, and my zip code was 02148. All was well. Then one day about 10 years ago, the USPS decided that the preferred name of my town (Post Office) was Waban. It changed Newton to the alternate name. Unfortunately on that very day they also decided to change the zip code! (This was part of a wholesale renumbering of many towns.) Every data analyst knows it is not a good idea to change the identifier of an entity. It is extremely bad to change all of its identifiers at once. In theory companies are supposed to subscribe to the USPS lists, which do mark the changes. In theory companies are supposed to allow either the preferred or alternate name. In practice some only allow the primary. In practice some do not bother to subscribe, or do not have a reliable system to process the updates. At many web sites (big ones including I recall CNN, eBay, Amazon, NY Times, etc.) I got a wide variety of problems. So said Newton did not exist, or that Waban did not exist, or that my zip code did not match my city, etc. On one site that I really wanted to use I actually tried all 4 combinations without success. I can only conjecture that their system had some subtle flaw -- perhaps it had not been coded to handle a town whose name and zip code changed simultaneously, and so it just deleted it from the database. I have now gone more than a year since this problem has occurred, and so I think the various web sites may have all caught up. P.S. The USPS did this for several, but not all, of the villages of Newton. This is of no benefit to the people who live there. It is a service that the U.S. Post Office provides at the request of marketers, who want to be able to easily distinguish prestige addresses by the city name. Hopefully helpfully yours, Steve -- Steve TolkinSteve . Tolkin at FMR dot COM 617-563-0516 Fidelity Investments 82 Devonshire St. V4D Boston MA 02109 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. -Original Message- From: Chris Devers [mailto:[EMAIL PROTECTED] Sent: Friday, June 17, 2005 11:34 AM To: Joel Gwynn Cc: Boston.PM; [EMAIL PROTECTED] Subject: Re: [Boston.pm] Geo::Coder::US RE: GoogleGeoCoder On Thu, 16 Jun 2005, Joel Gwynn wrote: When you get right down to it, this Boston neighborhood thing is just confusing. I work in Dorchester but management likes to put Boston on the stationary, which is confusing because there's an identical address in Boston proper, just with a different zip code. Are there any other cities that have similar naming schizophrenia? Sure, I imagine it happens all over the place. As has been noted in other comments in this thread, big towns assimilate smaller towns all the time, so current neighborhood names are often the names of formerly independent political entities. But then, it's not even always assimilation. People all over the world know that Harvard Square is in Cambridge, Massachusetts, but it isn't, as far as I know, a formal geographic boundary in any useful sense -- it's just a district in that part of Cambridge. But then maybe I'm revealing some ignorance here, as I've lived in the Boston area since I was a kid and yet I still don't actually know what square is really meant by the trm Harvard Square -- I've always assumed that it's centered on the T station, but that's not actually on Harvard's campus, hence the ambiguity. At $past_job, some of my coworkers were working on a real estate site. For this, they had to be able to handle all kinds of random input from people that, whether or not it was on any formal map, did in fact denote a perfectly well understood geographic area. Harvard Square. Union Square. Mark Sandman Square. Financial District. Theatre District. Leather District. Back Bay. Fort Point. South End. World's End. Greenbush. Queen Anne's Corner. Four Corners. Assinippi. Minot. Humarock. Silver Lake. Cedarville. Just to name a few. All of these are definite places in or around Boston or southeastern Massachusetts, but none of them is an actual town or city. But if you put any of them on an envelope, the mail will very probably get to its intended destination, and if you put any of them into a search string on a real estate site, it has to return results for that area. My impression is that dealing with all these varying names for the same places was the main impetus for setting up the ZIP code system in the first place. As long as you have the right ZIP code on an envelope, you can call your neighborhood Fatty Arbuckle for all the post office cares. Heh. Come to think of it, I might start calling my street that... :-) -- Chris Devers ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
RE: [Boston.pm] THE NAZIS HAD A CERTIFICATION FOR PERL
Right. The horse is dead. Please stop beating it. Dear Ronald, as our fearless leader will you please ask everyone to stop all these threads on certification and advocacy. Now I know why there are literally millions of matches in Google. This topic draws in people like flies to Hopefully helpfully yours, Steve -- Steve TolkinSteve . Tolkin at FMR dot COM 617-563-0516 Fidelity Investments 82 Devonshire St. V4D Boston MA 02109 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. -Original Message- From: Chris Devers [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 01, 2005 2:50 PM To: Boston Perl Mongers Subject: [Boston.pm] THE NAZIS HAD A CERTIFICATION FOR PERL But then, you can't invoke Godwin deliberately, can you? Wasn't mentioning [implicitly, national] socialism close enough? No? Damn. -- Chris Devers, fascinated just how many thousands of words this thread has produced, and yet managed to clarify exactly nothing while doing so ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
RE: [Boston.pm] short-listing languages for applications software development
I think this is the best point that has been advanced in favor of using perl: Amazon, Google, Yahoo, Morgan Stanley all use Perl in production ... Does anyone have additional details, e.g. the names of the projects, number of servers, number of users, estimated cost, estimated savings by using perl, etc. This is basic information that should be available to Perl advocates, i.e. easily findable at http://www.perl.org/advocacy/ which unfortunately does not have anything of the sort. Hopefully helpfully yours, Steve -- Steve TolkinSteve . Tolkin at FMR dot COM 617-563-0516 Fidelity Investments 82 Devonshire St. V4D Boston MA 02109 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. -Original Message- From: Ranga Nathan [mailto:[EMAIL PROTECTED] Sent: Thursday, February 24, 2005 9:06 PM To: boston-pm@pm.org Subject: Re: [Boston.pm] short-listing languages for applications software development I met that person and discussed about the richness or perl data structures. He was adamant that perl did not have strong typing. I told him that perl is intelligent and would guess the data type. What the heck? In business applications I have hardly come across anything more than a = b + c ! 95% what we handle are strings. Which is the most preferred language for strings? Also, he said that perl code looked confusing! Well everything requires some getting used to. But I know a lot of COBOL programs that are utterly confusing. Requiring 'system.out.println' could be confusing for someone not used objects at all. It went on for some time but neither of us convinced the other. But I did tell him that Amazon, Google, Yahoo, Morgan Stanly all use Perl in production and in fact we are using perl in mission-critical production. We had problems but it had nothing to do with perl or the architecture! __ Ranga Nathan / CSG Systems Programmer - Specialist; Technical Services; BAX Global Inc. Irvine-California Tel: 714-442-7591 Fax: 714-442-2840 ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
RE: [Boston.pm] (also) Perl
Well just about everything that can be said on this thread has been said, except for this. Google for: perl (certification OR certificate) produces 2170 matches. This matches two phrases. If you remove the quotes, i.e. Google for: perl (certification OR certificate) produces 1.2 million hits. Among them is this, from the Perl Journal http://www.tpj.com/documents/s=1131/sam05040001/letters.htm?temp=NJykmWt Eip which says in part: I was wondering if you knew of anyone that offers a Perl Certification Program? ... At the second O'Reilly Perl conference, Mark-Jason Dominus, Nathan Torkington, and I sold Perl Certificates. You named a title (Perl Monger, Perl Studmuffin, and Perl Sultan were all chosen), and an Official Perl Certification was immediately printed for you to take home and frame. To receive a certificate, you needed to show no qualifications other than the ability to open up your wallet and fork over $2. (This is like other certification programs, but cheaper.) [You can read the rest if you want.] Hopefully helpfully yours, Steve -- Steve TolkinSteve . Tolkin at FMR dot COM 617-563-0516 Fidelity Investments 82 Devonshire St. V4D Boston MA 02109 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
RE: [Boston.pm] a car talk puzzle
See my answer after the original message. It uses Perl, but the minimum amount. It only took a few seconds to do it the natural way (natural if you are used to grep, comm and other Unix utilities). I used these utilities in part because Chris suggested their use, and in part because I think this is the quickest way to solve the problem in programmer time. Steve -Original Message- From: Chris Devers [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 08, 2004 2:29 PM To: Boston Perl Mongers Subject: [Boston.pm] a car talk puzzle This seems like something that would be fun to solve with Perl: RAY: I have, written on a piece of paper in front of me, a word that is plural and also masculine. Now, I know we don't have masculine and feminine words in English the way we do in Italian or French. But, we do have words that connote masculinity. For example, the word boys is a plural word that connotes masculinity. The word I have written here is like boys. It's masculine, and ends in s. Not only that, but you change this word from plural to singular and from masculine to feminine, all by adding an s to it! I spent last night reading the entire Oxford English Dictionary, and I only found one example for which this works. Ok, so I've got a word list, how many words can there be that end in S? $ grep -ic 's$' /usr/share/dict/words 25998 Oy, way too many. But how many end in a double S? $ grep -ic 'ss$' /usr/share/dict/words 9552 Better, but not much better. If the word in question is in /usr/share/dict/words, then it should be one of the (hopefully) rare words that is a -ss word that, when the last -s is dropped, is also in the larger -s list. With luck, there will be only one; realistically, this should shorten the list enough that the answer can be found manually. Can anyone think of a clever way to do this ? -- Chris Devers ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pmHere Here is my deliberately non-clever solution. Note that I am running these Unix utilities in my DOS box; I got them from http://unxutils.sourceforge.net/ C:\wordlistsgrep ss$ words.txt o1 C:\wordlistsgrep [^s]s words.txt o2 C:\wordlistsperl -ne chomp; print $_ . qq(s\n) o2 | sort o3 C:\wordlistscomm -12 o1 o3 o4 C:\wordlistswc o4 5 5 30 o4 C:\wordlistscat o4 ass buss canvass discuss hiss Oh well. It looks like my version of /usr/dict/words (which I named words.txt) did not have the answer. So I ran the same sequence of steps with a bigger word list, the yawl.lst (yet another word list) which is very large. It can be downloaded from http://personal.riverusers.com/~thegrendel/software.html and other places. C:\wordlistsgrep ss$ yawl.lst o1 ... C:\wordlistswc o4 127 1271007 oo4 Eyeballing the list I come up with the following answer. Warning!! spoiler below, do not hit page down unless you want to see it millionairess Clearly millionairess is feminine and singular and I think that millionaires does have a masculine connotation. Hopefully helpfully yours, Steve -- Steve TolkinSteve . Tolkin at FMR dot COM 617-563-0516 Fidelity Investments 82 Devonshire St. V4D Boston MA 02109 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
RE: [Boston.pm] I want a compile time check on missing parens in regex
Summary: What is the scope of $1 and when does it get reset? Details: Thanks for the reply, Ron. It indicates that I understand this even less than I thought. What are the rules for remembering a previous value of $1 (and the other numeric variables set by pattern matching)? In the program where I discovered the problem I have a bunch of regexes, and so there could have been a value for $1 in effect. But I got a warning message anyway. Why wasn't that earlier value of $1 used? Or was I used, and I only got the warning where there wasn't a value for $1. Does the zero length string (aka null string) act as a previous value of $1? Thanks, Steve -Original Message- From: Ron Newman [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 11:52 AM To: Tolkin, Steve Cc: [EMAIL PROTECTED] Subject: Re: [Boston.pm] I want a compile time check on missing parens in regex If I intend to write something like s/([ab])c/$1c/; but accidentally omit the parentheses and write s/[ab]c/$1c/; I get a run time error message -- assuming the pattern matches the input data. But if the test data does not expose this bug I might not find out about it until later. Is there any way to get a compile time check? That's not possible in general, because there could legitimately be a $1 left over from a previous regex match. ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
RE: [Boston.pm] I want a 'compile time' check on missing parens in regex
OK, here is the answer: http://www.perldoc.com/perl5.6.1/pod/perlre.html says: The numbered variables ($1, $2, $3, etc.) and the related punctuation set ($+, $, $`, and $') are all dynamically scoped until the end of the enclosing block or until the next successful match, whichever comes first. and 5.8.4 is the same except adding $^N (whatever that is). So it is not possible in Perl 5. Note that these numbered variables are somewhat like global variables, and go do action at a distance. Is there going to be a way in perl 6 to control this better? Steve -Original Message- From: Greg London [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 12:30 PM To: Tolkin, Steve Cc: [EMAIL PROTECTED] Subject: RE: [Boston.pm] I want a 'compile time' check on missing parens in regex Tolkin, Steve said: What is the scope of $1 and when does it get reset? here's a start: http://www.greglondon.com/iperl/html/iperl.html#20_5_2_Capturing_parenth eses_not_capturing I suppose I should make a note to include some s/// examples... note to self: self, add some s/// examples. -- Impatient Perl A GNU-FDL training manual for the hyperactive. Free HTML/PDF downloads at greglondon.com/iperl Paperback/coilbound available for $8.50+sh ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
[Boston.pm] FW: GBC/ACM Announcements
I believe that the technical portion of this, i.e. the talk on Parrot by Dan, is open to the public. (But I have not checked. Dan, do you know?) Steve -Original Message- From: Kenneth Baclawski [mailto:[EMAIL PROTECTED] Sent: Thursday, June 10, 2004 11:03 PM To: Tolkin, Steve Subject: GBC/ACM Announcements Announcements this month include: Annual GBC/ACM Meeting and Election of Officers June GBC/ACM Monthly Meeting - The Greater Boston Chapter of the ACM Annual Business Meeting Thursday, June 17, 2004 MIT Room 34-101 7:00 - 7:15 pm - President nomination: Peter Carmichael who is currently VP and PDS Brochure and Lecture Notes Editor; and PDS and Volunteer Committee member. - VP nomination: Jay Conne who is currently a member of the PDS and Volunteer committees and is a former President, Membership Chair and PDS Registrar. - Secretary nomination: Ed Bristol who is the incumbent Secretary and former President of the IEEE Control Society. - Treasurer nomination: Yona Carmichael who is currently PDS Brochure Editor and recently hosted a volunteer appreciation party at her and Peter's home. Yona is also Treasurer for the local chapter of the Society for Creative Anachronism. - The Greater Boston Chapter of the ACM will be having a Monthly Meeting on Thursday, June 17, 2004 MIT Room 34-101, Cambridge, MA 7:15 - 9:15 pm (note time) Parrot: Structure and Building of a Virtual Machine Dan Sugalski Abstract: This is a two-part talk. In the first part we'll sketch a broad outline of the architecture of Parrot, a virtual machine being designed to efficiently run the so-called dynamic languages. (Primarily Perl 5, Perl 6, Python, and Ruby) In the second part of the talk we'll cover some of the techniques and build tools we've developed as part of the process to abstract out the building and platform-specific optimizing of the VM source. (Somewhere between 75 and 80% of Parrot's source is preprocessed or autogenerated, some of it quite significantly) Dan is the lead designer of Parrot and past contributor to Perl. He's currently employed writing compilers for a metals wholesaling company, much to his surprise, and has written a number of articles and parts of books on Perl and Parrot. There will be a business meeting from 7:00 to 7:15 pm immediately preceding the talk. Directions to MIT, building 34, room 101: MIT is located at 77 Massachusetts Avenue, just on the north side of Memorial Drive in Cambridge, MA. The URL http://whereis.mit.edu contains a map of the area. ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
RE: [Boston.pm] list viruses
I almost never open an attachment, unless it comes from a known and trusted source, and I am expecting an attachment. This is an antivirus measure. So unfortunately I never get to read the posts certain by certain people, e.g. Sean Quinlan, because for some reason their posts become an attachment. So I would like you to consider blocking email with attachments. At a minimum this would encourage people to send plain old email. However a better approach, if viable, is converting the attachment to plain text and pasting it inline. Ideally this would preserve the fact that it once was an attached file, and it also the file's name. This should work with all non-binary files, I do not think there is any need to post binary files to this list. I do not know enough about the programs that send mail or intermediary programs, or the processing when mail arrives, to understand if this is possible or easy to do. Hopefully helpfully yours, Steve -- Steve TolkinSteve . Tolkin at FMR dot COM 617-563-0516 Fidelity Investments 82 Devonshire St. V4D Boston MA 02109 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. -Original Message- From: Ronald J Kimball [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 05, 2004 9:55 PM To: Chris Devers Cc: Boston Perl Mongers Subject: Re: [Boston.pm] list viruses On Wed, May 05, 2004 at 09:25:08PM -0400, Chris Devers wrote: Okay, so two viruses have made it to the list today. In both cases, it looks like the mail came from Verizon customers: Received: from pm.org (pool-141-154-212-242.bos.east.verizon.net [141.154.212.242]) by mail.pm.org (8.11.6/8.11.6) with ESMTP id i45Joc914994 for [EMAIL PROTECTED]; Wed, 5 May 2004 14:50:39 -0500 Received: from pm.org (pool-141-154-222-33.bos.east.verizon.net [141.154.222.33]) by mail.pm.org (8.11.6/8.11.6) with ESMTP id i460aa919816 for [EMAIL PROTECTED]; Wed, 5 May 2004 19:36:36 -0500 Boston.pm's mail is served by Mailman, right? Does Mailman have a way to filter [presumably unsubscribed] incoming mail by network? These messages were both forged from addresses that are subscribed to the mailing list, which is why they made it through. Incoming mail from non-member addresses is already moderated. Going to a purely moderated list might be annoying for whoever has to do it [maybe Ronald, maybe someone else]. I have already turned on content filtering for the list. This will remove unwanted attachments, but still sends the remainder of the message through. (This is why the second message was missing its payload.) If that's not sufficient I can try rejecting all messages that contain attachments, but that will block some legitimate posts. Going to the pure Perl Siesta list manager software would be an interesting move, but I'm not sure if it's stable enough yet. That would be up to the pm.org sysadmins. Ronald ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
[Boston.pm] Thanks Andrew for all the Perl Monger meetings you hosted at Boston.com
Title: Thanks Andrew for all the Perl Monger meetings you hosted at Boston.com Dear Andrew, I wish to express my personal thanks for the work you did in support of hosting the Boston Perl Monger meetings. Thanks, Steve Tolkin ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
[Boston.pm] using {3-8} instead of {3, 8} doesn't produce even a warning?
Title: using {3-8} instead of {3,8} doesn't produce even a warning? # run using e.g. echo hello | perl this-file # Why doesn't perl produce a warning from {3-8} ? This seems # to be a syntax error. It surely is not the way to match strings of length 3 - 8. It # should be {3,8} . while () { if (/[a-z]{3-8}/) { print; } } ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
[Boston.pm] why no warning about this infinite loop
Title: why no warning about this infinite loop # run using e.g. echo hello | perl this-file # Why doesn't perl produce a warning from the following. It is an # infinite loop. If I add a /g modifier to the m// it works fine. while () { while (m/([a-z])/) { # warning infinite loop!!! print $1, \n } } /// In general it is hard to detect infinite loops, but in this case it is easy, because the pattern is a constant. I think this is a very common special case, and is worth detecting. Why isn't this done? I am running perl 5.8 Steve ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
RE: [Boston.pm] using {3-8} instead of {3, 8} doesn't produce eve n a warning?
Thanks for the explanation. So this is a documented feature. I was fooled by believing the general principle that special characters are special unless escaped with a backslash. I would have greatly preferred consistency in this. Are there other known (and perhaps even documented) violations of that principle? I scanned the 5.8 perltrap for curly and this was not listed. Who should I notify to request its inclusion? Steve -Original Message- From: Ronald J Kimball [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 27, 2004 5:06 PM To: Tolkin, Steve Cc: [EMAIL PROTECTED] Subject: Re: [Boston.pm] using {3-8} instead of {3, 8} doesn't produce even a warning? On Tue, Jan 27, 2004 at 04:55:28PM -0500, Tolkin, Steve wrote: # run using e.g. echo hello | perl this-file # Why doesn't perl produce a warning from {3-8} ? This seems # to be a syntax error. It surely is not the way to match strings of length 3 - 8. It # should be {3,8} . while () { if (/[a-z]{3-8}/) { print; } } perldoc perlre: The following standard quantifiers are recognized: * Match 0 or more times + Match 1 or more times ? Match 1 or 0 times {n}Match exactly n times {n,} Match at least n times {n,m} Match at least n but not more than m times (If a curly bracket occurs in any other context, it is treated as a regular character.) In other words, in Perl /[a-z]{3-8}/ is equivalent to /[a-z]\{3-8\}/. Ronald ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
RE: [Boston.pm] why no warning about this infinite loop
OK, My comments below apply to this and Uri's similar comments. I should have said: this infinite loop is easy to detect because: 1. the pattern is constant 2. the data (here $_) is not modified in the loop Both points are obvious to a person. In this simple and important special case it is also easy for most compilers (of languages other than Perl). In principle quite complex code can be analyzed to determine accurately that the data is not modified. I conclude that the Perl compiler has either * chosen to not do this kind of analysis, or * any such analysis is not connected to the error mechanism. I am curious if Dan S. has any comments on this w.r.t. Parrot. Steve -Original Message- From: Ronald J Kimball [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 27, 2004 5:33 PM To: Tolkin, Steve Cc: [EMAIL PROTECTED] Subject: Re: [Boston.pm] why no warning about this infinite loop On Tue, Jan 27, 2004 at 05:04:03PM -0500, Tolkin, Steve wrote: # run using e.g. echo hello | perl this-file # Why doesn't perl produce a warning from the following. It is an # infinite loop. If I add a /g modifier to the m// it works fine. while () { while (m/([a-z])/) { # warning infinite loop!!! print $1, \n } } /// In general it is hard to detect infinite loops, but in this case it is easy, because the pattern is a constant. I think this is a very common special case, and is worth detecting. The pattern in the below code is also constant, but there is no infinite loop: while () { while (m/([a-z])/) { print $1, \n; $_ = substr($_, 1); } } As you say, it is hard to detect infinite loops. :) Ronald ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
RE: [Boston.pm] OT:Safari Bookshelf
Since you asked, I had a few specific criticisms also. I was part of a pilot at my work place. One major criticism I had is that is sent me my password *in the clear* as part of a routine reminder. I replied that this is extremely bad practice. In fact it should not even store my password, using a Unix like approach of hash + salt. Here is a sanitized version of my message sent to '[EMAIL PROTECTED]' last March. P.S. We did decide to sign up for the Safari service. -Original Message- From: Tolkin, Steve Sent: Thursday, March 27, 2003 1:34 PM To: '[EMAIL PROTECTED]' ... Subject: Never sent user password in email -- this is a serious breach of security Dear Safari, Your email to me included my password. This is a serious breach of security. Please tell me that you will fix this. I never do business with any organization that sends out a password in email (unless explicitly requested by the user). Thanks, Steve -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, March 27, 2003 5:28 AM To: [EMAIL PROTECTED] Subject: Time Flies Steve, How time flies! This note is just a friendly reminder that you are half way through your free trial to Safari Tech Books Online. Log in today and let Safari pinpoint information for your urgent IT questions. Safari's powerful search engine is far more efficient than wading through piles of books and articles and more effective than message boards or tracking down colleagues for answers. As a reminder, your login URL is http://search.safaribooksonline.com/ User Name: steve dot. tolkin at@ fmr dot. com Password: SHOULD NEVER SEND PASSWORD UNLESS REQUESTED! Need help getting started? Join us for a quick LIVE tutorial. -- Every Tuesday -- 4:15 - 4:45 pm EST ... [rest of marketing blather snipped] Hopefully helpfully yours, Steve -- Steve TolkinSteve . Tolkin at FMR dot COM 617-563-0516 Fidelity Investments 82 Devonshire St. V4D Boston MA 02109 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. -Original Message- From: Andy Oram [mailto:[EMAIL PROTECTED] Sent: Monday, January 05, 2004 11:44 AM To: [EMAIL PROTECTED] Subject: Re: [Boston.pm] OT:Safari Bookshelf I guess I should stop lurking and say thanks for all the kind comments. Anything special that any of you would like me to pass on to people I know on the Safari team? I haven't noticed any specific criticism. Also, if you feel happy enough that you'd like to give a testimonial that we could use in marketing, let me know and I'll find a marketing person to slurp it up. -- Andy Oram O'Reilly Associates, Inc.email: [EMAIL PROTECTED] Editor 90 Sherman Street voice: 617-499-7479 Cambridge, MA 02140-3233 fax: 617-661-1116 USA http://www.praxagora.com/andyo/ Stories at Web site: The Bug in the Seven Modules Code the Obscure The Disconnected -- ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
[Boston.pm] Re: the XPath replace() function and regex patterns like s/^.../. ../g
Title: Message One clarification. The suggested workaround was not to just start the regex with a ^ but to start it with ^.* I have also changed the body of the message below to reflect this. -Original Message-From: Tolkin, Steve Sent: Monday, November 10, 2003 5:05 PMTo: [EMAIL PROTECTED]Subject: [Unverified Sender] [Boston.pm] the XPath replace() function and regex patterns like s/^.../.../g The proposed regex replace() function in XPath 2.0 (and also XQuery 1.0) always replaces all matching strings, i.e. as if it had the g modifier in Perl's s///g For details see http://www.w3.org/TR/xpath-functions/#func-replace (It does define the semantics of overlapping strings the same as perl.) However it seems to me that always replacing all the matching strings might cause some loss in functionality, because there is no obvious way to get it to only do one replacement. The suggested workaround to achieve changing only the first matching string is to put ^.* at the start of the pattern. So I first ask a technical question, about Perl's behavior. Q1. Will a pattern such as s/^.../.../g i.e. one that is anchored by a leading ^ ever change more than one matching string? Now a question about the real consequences of the current XPath proposal. What is a good "use case" for wanting a replace-one in addition to a replace-all? The best case I can think of where this does cause a problem is a pattern to preserve any leading whitespace (perhaps to keep the indentation the same) but replace all other whitespace with a single blank. The following perl _expression_ fails to do this, s/^(\s*)(\S+)(\s+)/\1\2 /g and so I believe that it will be very hard to do with replace(). Q2. Can you think of a better "use case"? Assuming that there are serious problems identified there are several ways to solve this in XPath. Q3. What is your preference? a. Have two functions with different names e.g. replace-first() and replace-all() (if so please choose your preferred names from the following set: For first: replace, replace-one, replace-first For all: replace, replace-all b. Change the default for replace() to mean replace first, and add a flag named "g" to mean replace all. (Note that there already are flags named "s" and "m" with their perl meanings. I have access to a newer version of the spec than the one that is posted.) c. Keep the default for replace() as meaning replace all and add a new option (what letter?) meaning replace first. d. Something else Any advice will be appreciated. Hopefully helpfully yours, Steve -- Steve Tolkin Steve . Tolkin at FMR dot COM 617-563-0516 Fidelity Investments 82 Devonshire St. V4D Boston MA 02109 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
RE: [Boston.pm] Postal address De-duping
The article in question can be found at http://www.foo.be/docs/tpj/issues/vol4_1/tpj0401-0002.html (I had a hard time finding it via tpj.com, but Google worked.) Unfortunately I think that the USPS site http://www.usps.com/cgi-bin/zip4/zip4inq needed to run this script is no more. A search there for zip4inq produced nothing. Does anyone know of a similar page, wither by the USPS or another provider of (web) services? Hopefully helpfully yours, Steve -- Steven Tolkinsteve . tolkin at fmr dot com 617-563-0516 Fidelity Investments 82 Devonshire St. V4D Boston MA 02109 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. -Original Message- From: Jon Orwant [mailto:[EMAIL PROTECTED] Sent: Monday, August 04, 2003 6:15 PM To: Joel Gwynn Cc: [EMAIL PROTECTED] Subject: Re: [Boston.pm] Postal address De-duping On Monday, August 4, 2003, at 05:12 PM, Joel Gwynn wrote: Hey, all. We do lots of (snail) mailings, and we're looking for a fast, customizable de-duping solution. We're currently taking a look at doubletake from http://peoplesmith.com/, which is not too expensive, but I was thinking there might be some perl stuff out there, given perl's text-processing powers. There's a wee script I wrote for TPJ a while back that scrapes the U.S. Postal Service's address canonicalizer. The script is on tpj.com; look under Archives for the article called Five Quick Hacks. The canonicalizer (well, they call it a zip code locator or something like that) will transform variants on the same address into the One True Address that the USPS recognizes, so de-duping then becomes a matter of simple string matching. Won't help you for foreign addresses, obviously. -Jon ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
RE: [Boston.pm] emacs discussion
As a long time emacs user I must agree with the positions we have all been agreeing with: * it has a long learning curve * it has a lot of power So I have a lot invested in it, and want to ensure emacs continues to survive, nay thrive. Unfortunately I think its rate of adoption is continually going down, as more people use Windows and fewer use Unix. I have configured my emacs to use the Windows keys. ;; Make the ctrl-c ctrl-v ctrl-x keys work like they don in Windows ;; 2003-03-17 I downloaded from http://www.cua.dk/cua.html Version: 2.10 (require 'cua) (CUA-mode t) However there is a BUG in emacs (or the documentation). If you need to run a command that begins with C-x you must hold the Shift key down while pressing Ctrl. The other workarounds suggested in the CUA documentation did not work for me: Press the prefix key twice very quickly (within 0.2 seconds), press the prefix key and the following key within 0.2 seconds) Does anyone gotten these two techniques to work? Does anyone have other ideas to help ensure the continued widespread use of emacs? Steve ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
[Boston.pm] Perl 6 has become too complex
In Apocalyse 6 http://www.perl.com/pub/a/2003/03/07/apocalypse6.html Larry Wall explains how subroutines are going to work in Perl 6. I think this is the straw that broke the camel's back. I think this is the worst case of second system syndrome I have ever seen (See Jargon file e.g. at http://info.astrian.net/jargon/terms/s/second-system_effect.html ) and I quote: When one is designing the successor to a relatively small, elegant, and successful system, there is a tendency to become grandiose in one's success and design an elephantine feature-laden monstrosity. I think the language design shows too much influence of Evil Damian. I want good Damian to work with Larry el al. to reduce the complexity of the language. Or (shudder) a subset of the language to be defined. Please advise me as to how to proceed. Hopefully helpfully yours, Steve -- Steven Tolkin steve . tolkin at fmr dot com 617-563-0516 Fidelity Investments 82 Devonshire St. V8D Boston MA 02109 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
RE: [Boston.pm] That's a Haiku. A freaky little perl Haiku.
Actually the best poetic form to feature the word autovivication would seem to be the Double Dactyl see http://lonestar.texas.net/~robison/dactyls.html http://www.kith.org/logos/words/lower/d.html etc. e.g. the self-describing Higgledy-Piggledy Dactyls in dimeter, Verse form with choriambs (Masculine rhyme): One sentence (two stanzas) Hexasyllabically Challenges poets who Don't have the time. Providing the other 7 lines is left as an exercise. P.S Yes I know that the way autovivication is prounounced normally is not quite a double dactyl, but its close enough for this deliberately silly poetic form. Hopefully helpfully yours, Steve -- Steven Tolkin [EMAIL PROTECTED] 617-563-0516 Fidelity Investments 82 Devonshire St. V4D Boston MA 02109 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Thursday, February 06, 2003 1:44 PM To: [EMAIL PROTECTED] Subject: [Boston.pm] That's a Haiku. A freaky little perl Haiku. [EMAIL PROTECTED] wrote: Do What I Mean and Autovivification aren't what I wanted. Hm, though technically accurate in Joel's situation, I think it would be better if I generalize it to be more universal, rather than worry about it being taken out of context. Therefore: Do What I Mean and Autovivication can be unwanted Hey, I think I just got me a new signature file... Greg ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
RE: [Boston.pm] Damian's Natural Language Parsing Meeting
Is this informatioon avaialble online, e.g. in a Perl module, or an exegesis, etc.? {I have read all the apocalypses and exegeses on Perl 6.) I am interested in attending this meeting, but would prefer to read this information first (or instead). Hopefully helpfully yours, Steve -- Steven Tolkin [EMAIL PROTECTED] 617-563-0516 Fidelity Investments 82 Devonshire St. V8D Boston MA 02109 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. -Original Message- From: James Freeman [mailto:[EMAIL PROTECTED]] Sent: Tuesday, January 21, 2003 10:54 PM To: [EMAIL PROTECTED] Subject: [Boston.pm] Damian's Natural Language Parsing Meeting Hi Folks, I have organized a meeting for Damian to speak to the bioinformatics gurus in the local area. His Natural Language Parsing with a bioinformatics focus will be at Boston University. Details below: http://informagen.com/NEBiG/ Warmest Regards, Jim -- Bioinformatics Consultant [EMAIL PROTECTED] voice:781-646-0742 mobile:617-429-6352 ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
RE: [Boston.pm] damian talk
I vote for Life, the Universe, and everything. ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
RE: [Boston.pm] damian talk
If I recall correctly, Olive Oyl, in some old Popeye cartoon, says it to the Brutus character in the definitive American way: Et tu, you brute Hopefully helpfully yours, Steve -- Steven Tolkin [EMAIL PROTECTED] 617-563-0516 Fidelity Investments 82 Devonshire St. V8D Boston MA 02109 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. -Original Message- From: Drew Taylor [mailto:[EMAIL PROTECTED]] Sent: Monday, January 13, 2003 3:02 PM To: Walt Mankowski; [EMAIL PROTECTED] Subject: Re: [Boston.pm] damian talk At 02:30 PM 1/13/03 -0500, Walt Mankowski wrote: Geez, screwing up Latin AND Shakespeare in one short phrase. Don't they teach you kids across the pond ANYTHING these days? :) Never underestimate the power of public education in the US. :-) Drew -- Drew Taylor| Web development consulting http://www.drewtaylor.com/ | perl/mod_perl/DBI/mysql/postgres -- Netflix: DVD Rentals by mail with NO late fees or due dates! Free Trial - http://www.netflix.com/Default?mqso=36126240 -- ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
[Boston.pm] wanted: perl code to do JAXB name mapping (LONG)
Summary: I am looking for a program to do name mappping as specified in Appendix C of the JAXB (Java XML Binding) spec. This for example will map from foo_bar to fooBar etc. Although they talk about Java and XML names, this mapping applies to many other programming languages too. In particular databases typically use the underscore character as the separator, and so this program would would be very useful for that translation. Note the careful treatment that locates the word break in front of an upper case letter followed by a lowercase letter e.g. FOOBar becomes FOO_BAR in the mapping to a constant. Details: $Id: jaxb_name_mapping.txt 1.3 2002/12/04 14:51:06 A071046 Exp $ [I quote from the following document, downloadable from Sun. I only quoted the first part of Appendix C - mapping XML name to Java Identidier. I also want a program to do the reverse mapping. It was in file jaxb-0_7-prd-spec.pdf. After copying the text and pasting it as plain ASCII I had to slightly edit this file, e.g. to align the tables using spaces, add newlines, etc. I lost many of the bullets in the original and did not manually add them all back.] quote from = The Java(TM) Architecture for XML Binding (JAXB) Public Draft, V0.7 September 12, 2002 C.1 Overview This section provides default mappings from: XML Name to Java identifier Model group to Java identifier Namepsace URI to Java package name C.2 The Name to Identifier Mapping Algorithm Java identifiers typically follow three simple, well-known conventions: Class and interface names always begin with an upper-case letter. The remaining characters are either digits, lower-case letters, or upper-case letters. Upper-case letters within a multi-word name serve to identify the start of each non-initial word, or sometimes to stand for acronyms. Method names and components of a package name always begin with a lower-case letter, and otherwise are exactly like class and interface names. Constant names are entirely in upper case, with each pair of words separated by the underscore character ('_', \u005F, LOW LINE). XML names, however, are much richer than Java identifiers: They may include not only the standard Java identifier characters but also various punctuation and special characters that are not permitted in Java identifiers. Like most Java identifiers, most XML names are in practice composed of more than one natural-language word. Non-initial words within an XML name typically start with an upper-case letter followed by a lower-case letter, as in Java, or are prefixed by punctuation characters, which is not usual in Java and, for most punctuation characters, is in fact illegal. In order to map an arbitrary XML name into a Java class, method, or constant identifier, the XML name is first broken into a word list. For the purpose of constructing word lists from XML names we use the following definitions: A punctuation character is one of the following: * A hyphen ('-', \u002D, HYPHEN-MINUS), * A period ('.', \u002E, FULL STOP), * A colon (':', \u003A, COLON), * An underscore ('_', \u005F, LOW LINE), * A dot ('.', \u00B7, MIDDLE DOT), * \u0387, GREEK ANO TELEIA, * \u06DD, ARABIC END OF AYAH, or * \u06DE, ARABIC START OF RUB EL HIZB. These are all legal characters in XML names. A letter is a character for which the Character.isLetter method returns true, i.e., a letter according to the Unicode standard. Every letter is a legal Java identifier character, both initial and non-initial. A digit is a character for which the Character.isDigit method returns true, i.e., a digit according to the Unicode Standard. Every digit is a legal non-initial Java identifier character. A mark is a character that is in none of the previous categories but for which the Character.isJavaIdentifierPart method returns true. This category includes numeric letters, combining marks, non-spacing marks, and ignorable control characters. Every XML name character falls into one of the above categories. We further divide letters into three subcategories: An upper-case letter is a letter for which the Character.isUpperCase method returns true, A lower-case letter is a letter for which the Character.isLowerCase method returns true,and All other letters are uncased. An XML name is split into a word list by removing any leading and trailing punctuation characters and then searching for word breaks. A wordbreak is defined by three regular expressions: A prefix, a separator, and a suffix. The prefix matches part of the word that precedes the break, the separator is not part of any word, and the suffix matches part of the word that follows the break. The word breaks are defined as: Table 3-1 XML Word Breaks Prefix Separator Suffix Example [^punct] punct+[^punct]foo|--|bar digit [^digit]foo22|bar [^digit] digit foo|22 lower [^lower]foo|Bar upper upper lower FOO|Bar letter [^letter]