[Boston.pm] software puzzle - extracting longest alphabetical list of phrases from a list of words

2008-10-26 Thread Tolkin, Steve
The following is just a problem in computer science.  It is not directly
related to Perl, or to my work.  I am looking for insights in how to
think about this.  

The input: a list of words.
The output: a partitioning of the input list into a longest list of
phrases, such that the phrases are in alphabetical order.  (Each phrase
is one of more consecutive words, and a word is a maximum length
sequence of non-space characters.) 

The following example shows that maximizing the number of phrases may
not produce the answer a person would, but it makes the problem solvable
by an algorithm that does not have a set of allowed phrases.  If there
are two or more lists of the same length assume any one will do as the
answer.

Example input 1: atta boy catch as catch can

Example output 1: 
atta
boy
catch as
catch can

I presume this problem is already known to software engineering.  What
is its name?  (For example, other problems are solved by connected
components, or topological sort, etc.)  

Here are a few things I know about solving this problem: 

It has complexity at most O(2^n) because there are at most 2^n
partitions.  A brute force algorithm would start with the case of having
n partitions, where each word is its own phrase.  If this is in
alphabetical order we are done.  Otherwise try all the cases where there
are n-1 partitions, then n-2 partitions, etc.  (This algorithm would
probably be OK for lists with a reasonable number of words.  I cannot
estimate the maximum number of word or phrases it could handle on a PC.)
Is there a deterministic algorithm in a lower complexity class?

I would be happy with a heuristic approach that did pretty well.  One
possible score for determining whether to start a phrase at a word has
two components:
1. Its position is the list.  A lower position is better, because we
want many phrases.  This is easily precomputed by a O(n) pass over the
list.
2. Its alphabetical order in the list.  Again a lower number is better.
This can be computed one time in advance by an n log(n) sort.
Then maybe something like alpha-beta pruning (a la chess) could be used
to evaluate the best position to introduce a phrase.

Once we have a phrase starting with some string $s then all words $w to
the right of $s such that $w le $s cannot start a phrase.  The first
phrase always starts with the first word.  So we can immediately mark
words alphabetically lt this as not able to start a phrase, e.g. as in
the example above is lt atta.  A heuristic approach might take
advantage of this.

Is there a greedy approach (one that never backtracks) that emits a
reasonable output?

P.S. In case anyone is interested in actually writing code to solve
this, the alphabetical order is case insensitive.   The origin of the
problem was doing a View Source on a web page that had a large drop down
list, and wanting to reconstruct the list of phrases.

P.P.S.  Congratulations to Ronald.  I predict that in 17 or 18 years he
will be helping (or nagging) Tobias with getting his college application
material done.   The time does fly by.


Steve Tolkin



___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] merging lists that are ordered but not sorted

2008-01-30 Thread Tolkin, Steve
I am replying to myself to thank all the Perl mongers who replied with
help.
Indeed, my problem is topological sort, as stated by Alex Vandiver and
Gyepi SAM.
I did not see that because the input format is different from that
required by the Unix tsort program.

A search for: tsort perl power tools
found this
http://search.cpan.org/src/CWEST/ppt-0.14/html/commands/tsort/index.html
which leads directly to the perl code I used this to solve the problem.
Note the strange name -- tcsort not tsort.  (Perhaps in homage to Tom
Christiansen, the prime mover of the very useful but moribund ppt
project.)

I earlier found tsort.exe (port to Windows) inside
coreutils-5.3.0-bin.zip at
http://sourceforge.net/project/showfiles.php?group_id=23617package_id=1
42775 Unfortunately this tsort.exe depends on libintl3.dll which was not
in the *.zip file and which I could not find anywhere.
Aside: Does anyone know where I can get a libintl3.dll ?

Both versions of tsort require the 2 values on each input row be
separated by one space.  Fortunately I was able to transform my data
into this format.

Major kudos to Ben Tilly!  He wrote from scratch a perl program that
solved the problem.   (Since he put in the effort to write this I took
some extra time to test it.  It produced the same output as tsort,
because the lists overlapped enough to overcome the fact that the output
order is in general not deterministic.)

I think the problem statement I gave was clear enough.  Any cycle in the
input is an error.  The tsort program in perl simply reports cycle
detected without any information as to which elements are on the cycle.

My use was not related to alignment of DNA.  It was part of a personal
mashup to combine data about cars that I scraped from e.g.
http://autos.yahoo.com/toyota_camry_se_v6-specs/?p=all
The actual values in the list are strings such as these:
Cylinders
Horsepower @ RPM
Fuel Economy Cty/Hwy

As another aside, if people are interested I can send 77 lines of data
for each of these 2008 model year cars: Camry, Accord, Infiniti_G35,
Impreza, Altima, Audi_A4, Volvo_S40, Saab_9_3
I would not mind off-list opinions on any of these cars.  In general I
want a car with width = 70.7 inches (the Accord at 71.7 is probably
too wide to fit in my garage), and would like AWD.


Thanks,
Steve


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Tolkin, Steve
Sent: Tuesday, January 29, 2008 12:12 PM
To: Boston Perl Mongers
Subject: [Boston.pm] merging lists that are ordered but not sorted

I am looking for a perl program that will solve the following problem.
Suppose I have 2 or more lists that are (conceptually) sublists of the
same underlying list.
I want to reconstruct the underlying list.  In other words the order of
the elements agrees in all the lists, but there is no sort condition.

Example:
List 1: dog, cat, mouse
List 2: dog, shark, mouse, elephant

There are 2 possible outputs, and I do not care which one I get.

The reason that I have not just coded this up is that it seems it
require an unbounded amount of look ahead.  Also, when there are more
than 2 lists, I think I need to read from all of them before making a
decision about which element can be safely output.

Thanks,
Steve
-- 
Steven Tolkin[EMAIL PROTECTED] 508-787-9006
Fidelity Investments   400 Puritan Way M3B Marlborough MA 01752
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates.
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


[Boston.pm] merging lists that are ordered but not sorted

2008-01-29 Thread Tolkin, Steve
I am looking for a perl program that will solve the following problem.
Suppose I have 2 or more lists that are (conceptually) sublists of the
same underlying list.
I want to reconstruct the underlying list.  In other words the order of
the elements agrees in all the lists, but there is no sort condition.

Example:
List 1: dog, cat, mouse
List 2: dog, shark, mouse, elephant

There are 2 possible outputs, and I do not care which one I get.

The reason that I have not just coded this up is that it seems it
require an unbounded amount of look ahead.  Also, when there are more
than 2 lists, I think I need to read from all of them before making a
decision about which element can be safely output.

Thanks,
Steve
-- 
Steven Tolkin[EMAIL PROTECTED] 508-787-9006
Fidelity Investments   400 Puritan Way M3B Marlborough MA 01752
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates.
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


[Boston.pm] which very large US bank uses Perl for their integration strategy

2007-10-12 Thread Tolkin, Steve
In the middle of the long list of  replies to a posting about why ESBs
are bad (and REST is good) at
http://steve.vinoski.net/blog/2007/10/04/the-esb-question/  I find this
reply:
30. John Davies says: October 7th, 2007 at 6:42 pm  
... your best option is shell scripts (awk, grep, cut, tail etc.) and
PERL, one very large US bank famously implemented their entire
integration strategy on this just a few years ago and it's already
out-lived a good half-dozen Java based efforts since.


Does anyone have the details about this?  He says famously, but I am
not aware of even which bank it is.

Thanks,
Steve
-- 
Steven Tolkin[EMAIL PROTECTED] 508-787-9006
Fidelity Investments   400 Puritan Way M3B Marlborough MA 01752
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates.



 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] merge and compare help

2007-08-27 Thread Tolkin, Steve
This can be easily extended to be a general purpose match/merge program.
Suppose we call the two inputs A and B.  Each ID is in one of three
possible cases, and so we want three subroutines, named e.g., just_in_a,
just_in_b, and in_both.   (In the original example just_in_a would do
the same thing as just_in_b, but that is not always desired.) 

I am looking for perl code that does this, in a configurable way, e.g.
let the user specify the ID column/s, sort the two inputs (if not
already sorted), read them both, call the subs, etc.  Please send a link
or the code itself.

thanks,
Steve
-- 
Steven TolkinSteve-d0t-Tolkin-at-fmr-d0t-com 508-787-9006
Fidelity Investments   400 Puritan Way M3B Marlborough MA 01752
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
John Macdonald
Sent: Monday, August 27, 2007 4:01 PM
To: Alex Brelsfoard
Cc: boston-pm@mail.pm.org
Subject: Re: [Boston.pm] merge and compare help

Your solution is the right one.  The final trick is to make
sure you keep going with one file after the other file reaches
the end.  I usually have the file read routine return a fake
record for EOF, that has a key guaranteed to be higher than
any real key.  (That requires knowing what the keys look like,
but it will often be something like \255\255\255\255.)  The
merge subroutine checks for that EOF key and exits.  If a merge
is done for a different key, then neither file can be at EOF.
If a record is written without needing a merge, then that file
at least is not at EOF.  This trick gets rid of a lot of code
that checks whether either or both files are at EOF when you
are deciding whether to read from a file, and comparing the current 
records.

On Mon, Aug 27, 2007 at 02:04:57PM -0400, Alex Brelsfoard wrote:
 Hi All,
 
 I'm back and with a new algorithm/solution I need help with.
 I have two csv files, sorted by the first column (ID).
 Each file may have all the same, none of the same, or some of the same
ID's.
 I would like to take these two files, and make one out of them.
 Two tricks:
  - When I come across the same ID in each file I need to merge those
two
 lines (don't worry about the merge, I can handle that).
  - I want to be looking at the least number of lines from each file as
 possible at any one time (optimally I would like to only be looking at
one
 of each file at the same time).
 
 Basically we are dealing with large files here and I don't want to
kill my
 RAM by storing all the data from both files into a hash or some other
 object.
 
 I have an algorithm I like, I'm just not certain how to implement it:
 1. Examine the ID of the first line of each file.
 2. If they are the same, then merge and print the merge to the final
output
 file..
 3. If they are not the same, find the lesser one and have it print its
 contents to the final output file until its ID is the same or greater
than
 the other file's.
 4. repeat.
 
 Any advice on how to do this?
 
 Thanks.
 --Alex
  
 ___
 Boston-pm mailing list
 Boston-pm@mail.pm.org
 http://mail.pm.org/mailman/listinfo/boston-pm
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] Extract text from html preserving newlines

2007-05-02 Thread Tolkin, Steve
Thanks Jerrad,  

I actually tried lynx first.  However, the html files are on a server
that needs authentication.  Even adding 
-auth my-user-id:my-pw 
To lynx was not enough.

Here is the lynx output (I added the # as these are comments in the perl
program):

# Looking up [my proxy]
# Making HTTP connection to [my proxy]
# Sending HTTP request.
# HTTP request sent; waiting for response.
# Alert!: Invalid header 'WWW-Authenticate: NTLM'
# Alert!: Can't retry with authorization!  Contact the server's
WebMaster.
# Can't Access [the url I wanted]
# Alert!: Unable to access document.
# 
# lynx: Can't access startfile


I am not sure what I really need to do.  I looked at the headers using
Mozilla Firefox add-on and decided that generating the proper values for
WWW-Authenticate was too complex for lynx, and for Mechanize too.   But
maybe I am missing something.


Steve


-Original Message-
From: Jerrad Pierce [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, May 02, 2007 1:45 PM
To: Tolkin, Steve
Cc: Boston Perl Mongers
Subject: Re: [Boston.pm] Extract text from html preserving newlines

lynx -dump
-- 
Free map of local environmental resources:
http://CambridgeMA.GreenMap.org
--
MOTD on Boomtime, the 49th of Discord, in the YOLD 3173:
It is useless for sheep to pass resolutions in favor of vegetarianism
while wolves remain of a different opinion.

 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] Extract text from html preserving newlines

2007-05-02 Thread Tolkin, Steve
That worked.  Thanks! 

Running lynx on my local copies of the *.html files works reasonably
well, although the output is not what IE produces, and is harder for me
to parse.  

A minor follow up question.  Currently I have to run lynx from its own
directory.  Otherwise I got 

\lynx_w32\lynx.bat foo.htm 
LINES value must be = 2: got 1
initscr(): LINES=1 COLS=1: too small.

Is there a way to set up lynx to let me run it from elsewhere?

Steve Tolkin
VP, Architecture   FESCo Architecture  Strategy Group   Fidelity
Employer Services Company
400 Puritan Way   M3B   Marlborough MA 01752   508-787-9006
[EMAIL PROTECTED]
The information in this email and subsequent attachments may contain
confidential information that is intended solely for the attention and
use of the named addressee(s). This message or any part thereof must not
be disclosed, copied, distributed or retained by any person without
authorization from Fidelity Investments.


-Original Message-
From: Chris Devers [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, May 02, 2007 1:53 PM
To: Tolkin, Steve
Cc: Boston Perl Mongers
Subject: Re: [Boston.pm] Extract text from html preserving newlines

On Wed, 2 May 2007, Tolkin, Steve wrote:

 Q1. Is there a way to automate IE or Mozilla Firefox to save 100's of
 files as text?

Probably, but might it be easier to automate using `lynx -dump` (or 
better still, `links -dump`) ?

If those produce output the way you want them, automating it should be a

snap to do, even with just a simple shell script. 

$ for f in *.html; do links -dump $f  ${f}.txt; done

Etc.


-- 
Chris Devers
DO NOT LEAVE IT IS NOT REAL

 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] Program wanted to recover text that has spaces inserted or deleted

2007-04-07 Thread Tolkin, Steve
As you suggest the easiest way is to just ignore all the blanks, and
then try to find words, probably by a greedy approach, and then backing
off.  However, the original email explained that extra spaces are much
more likely than missing spaces.  This information could be used to get
better results.

Thanks to Richard Barbalace for sending his program.
I can run it, and now I need to look at how to revise it.

Thanks,
Steve 

-Original Message-
From: Chris Devers [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 05, 2007 10:44 PM
To: Tolkin, Steve
Cc: boston perl mongers
Subject: Re: [Boston.pm] Program wanted to recover text that has spaces
inserted or deleted

On Apr 5, 2007, at 6:42 PM, Tolkin, Steve wrote:

 Also, this is somewhat more complicated because sometimes
 spaces can be removed, although occasionally with much lower  
 frequency.
 For example Arti factrefers ought to be Artifact refers

How is the program supposed to select from variants such as

   Artifact refers
   Art I fact refers

   documents and
   document sand

?

It almost seems like you can't trust the spaces at all, so you might  
as well just throw them all out and then look for valid word chains  
in the remaining text.

If nothing else, that would also solve the ancillary problem of a  
space before punctuation marks...



-- 
Chris Devers

 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


[Boston.pm] Program wanted to recover text that has spaces inserted or deleted

2007-04-05 Thread Tolkin, Steve
I am looking for a program that can recover the original text from text
that has spaces inserted or deleted.
Ideally in perl of course.

The following text has many places where an extra space is inserted.
Given a dictionary it would be possible to reconstruct the original
text, with only a few errors remaining.   
I probably could write a program like that, but I suspect this has been
done before.  Also, this is somewhat more complicated because sometimes
spaces can be removed, although occasionally with much lower frequency.
For example Arti factrefers ought to be Artifact refers.

Arti factrefers t o an appl i cat i on-l evel uni t of i nformat i on t
hat i s subj ect t o anal ysi s by some appl i cat i on. Exampl es i ncl
ude a t ext document , a segment of speech or vi deo, a col l ect i on
of document s and a st ream of any of t he above. 

Other notes:
One source of errors might be proper nouns, but a sophisticated program
could improve its handling of these, if it kept in memory the fragments
seen.
Nice to have the space before a comma etc. removed.


Thanks,
Steve
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] emma's pizza

2007-01-10 Thread Tolkin, Steve
Irish coffee contains all four required food groups:
Sugar, fat, caffeine, and alcohol 


P.S. Obligatory comment about Perl -- the Chilean pianist Alfredo Perl has 
recorded all of Beethoven's sonatas, and much else, and I recommend them.


Hopefully helpfully yours, 
Steve 
-- 
Steve Tolkin    Steve . Tolkin at FMR dot COM   508-787-9006
Fidelity Investments   82 Devonshire St. M3L Boston MA 02109 
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates. 


 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] Update to job posting policy?

2006-12-05 Thread Tolkin, Steve
I think the current policy is fine as is.  

Location is just one of many factors to be discussed before applying
for, or accepting, a job.  If this is a pressing concern the applicant
should ask about it in an early phone call.  Other people will care more
about salary, benefits, the nature of the work, etc.

Steve

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Ronald J Kimball
Sent: Tuesday, December 05, 2006 11:20 AM
To: Boston Perl Mongers
Subject: [Boston.pm] Update to job posting policy?

I received an off-list comment from a Perl monger, in response to the
recent job posting for a Senior Perl Developer in Waltham, MA.  The
monger
had spent some time talking with the recruiter, only to learn that the
location was too far from a commuter rail station to be worthwhile.  The
monger suggested that job postings from recruiters not be allowed unless
the employer's address is clearly stated.

Personally, I am not inclined to make this change to our policy.  I
think
that not identifying the employer is a reasonable position for
recruiters
to take, to protect their business, even though it can be frustrating
for
potential applicants.  I am worried that recruiters might choose not to
send job postings to our list at all.

I thought of an alternate suggestion, which is that job postings without
a
street address must indicate accessibility to mass transit.

Our job posting policy is a result of a consensus reached on the list a
few
years ago, so I decided to open this up to the whole list for comments.
Would you all like to see either of these suggested changes made?

Other feedback on the policy is also welcome.


Here is our current job posting policy:

---

Job postings may not be posted directly to the list.  Instead, job
postings
should be sent to [EMAIL PROTECTED]  I will review each posting,
and either post it to the list or return it to the sender for editing.

When I send a job posting to the list, the Subject header will include
the
string [job].


Guidelines for job postings:

1. Perl must be a primary aspect of the job.

2. The job must be located in the greater Boston area.

3. The following information should be included in the job posting:
   Required skill-set
   Contract or permanent position?
 Pay range, for contract positions
 Incentives, for permanent positions
   Placement through a recruiter, or directly with the company?
   Location, and whether telecommuting is available
   Company's product or service

---

Ronald
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] teaching kids Perl

2006-12-01 Thread Tolkin, Steve
Perl has at least one advantage over other languages -- it is easy to see 
variables, because they start with a dollar sign (or other sigil).  In my brief 
experience teaching programming to children this has proven to be helpful, 
because getting the difference between a variable and a string is important.

Hopefully helpfully yours, 
Steve 
-- 
Steve Tolkin    Steve . Tolkin at FMR dot COM   508-787-9006
Fidelity Investments   82 Devonshire St. M3L Boston MA 02109 
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates. 
 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kate Wood
Sent: Friday, December 01, 2006 10:30 AM
To: Boston Perl Mongers
Subject: [Boston.pm] teaching kids Perl

Hi all,
So... say you were going to teach a child (or several children) of
about ten, reasonable technical aptitude, to program using Perl. How
would you go about it? I'm doing some lessons for my daughter and her
friends for the spring,and need some further input.They're not quite
of an age where handing them the camel book and saying go for it is
realistic, but they're pretty self-motivated.

Kate
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] Python

2006-10-27 Thread Tolkin, Steve
Dear Ben, Bob et al.,
Thanks for this thread.  (It has a very high signal to noise
ratio, compared with many others.)

Dear Everyone,
Since this started about Python, in a Perl discussion list, I am
wondering about whether Perl facilitate the kind of experimentation that
led to stackless Python. http://www.stackless.com/ An experimental
implementation that supports continuations, generators, microthreads,
and coroutines.
See also
http://www.onlamp.com/pub/a/python/2000/10/04/stackless-intro.html
 
Perhaps not, because this will be built into Perl 6.

Perhaps not, because the Python community is different than the Perl
community in some fundamental way, e.g., there is only one version of
Perl.  

Perhaps not, because Continuations are a Bad Thing.

I believe some disciplined way of doing concurrency is clearly needed,
and I do not think any of our current abstractions are good enough.
(They may work in theory, but not in practice, e.g. they are too hard to
reason about, or to debug.)

I can think of no better path than for Perl to get this right, and run
well on the multi-core CPU systems of the future.
 

Steve

[rest of thread snipped]
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] Loop index in foreach?

2006-09-22 Thread Tolkin, Steve
Are you serious?  $.., $..., $ etc?!  Aii!!! he screams and runs away.  
Please stop this thread.

Hopefully helpfully yours, 
Steve 
-- 
Steve Tolkin    Steve . Tolkin at FMR dot COM   508-787-9006
Fidelity Investments   82 Devonshire St. M3L Boston MA 02109 
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates. 



 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Duane Bronson
Sent: Thursday, September 21, 2006 7:03 PM
To: Ronald J Kimball
Cc: boston-pm@mail.pm.org; Palit, Nilanjan
Subject: Re: [Boston.pm] Loop index in foreach?

$.. should be the iterator count in the parent loop, $... should be the 
iterator count in the grandparent loop, ...

my @fruits = ('apple','banana','cantaloupe');
foreach my $fruit  (@fruits) {
  foreach my $minusone (0..1000) {
foreach my $plusone (2..1000) {
  die inner loop count wrong unless $plusone == $.+1;
  die outer loop count wrong unless $minusone == $..-1;
  die way outer loop index wrong unless $fruit eq $fruits[$...];
}
  }

Ronald J Kimball wrote:
 On Thu, Sep 21, 2006 at 09:34:43AM -0700, Palit, Nilanjan wrote:

   
 I think it'd be fairly easy for Perl to auto initialize  increment a
 loop index in all loops  provide that to the user in a special
 variable. $. is an excellent example. I think it'd be a great addition
 to Perl's excellent ( long) list of special vars, making for yet more
 elegant  concise code.
 

 What would you have Perl do in the case of nested loops?

 Ronald
  
 ___
 Boston-pm mailing list
 Boston-pm@mail.pm.org
 http://mail.pm.org/mailman/listinfo/boston-pm

   

-- 
Sincerely   *Duane Bronson*
[EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
http://www.nerdlogic.com/
453 Washington St. #4A, Boston, MA 02111
617.515.2909

 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] Short time in Boston

2006-09-15 Thread Tolkin, Steve
Two in Cambridge are well worth seeing (especially for people who live here! :)

MIT Museum -- great permanent collection on robots, MIT hacks, holograms, 
mechanical sculptures by Arthur Ganson,   and usually also a variable show.  
http://web.mit.edu/museum/

Harvard Museum -- the world famous (and deservedly so) glass flowers.


Hopefully helpfully yours, 
Steve 
-- 
Steve Tolkin    Steve . Tolkin at FMR dot COM   508-787-9006
Fidelity Investments   82 Devonshire St. M3L Boston MA 02109 
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates. 



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of David H. Adler
Sent: Friday, September 15, 2006 1:01 AM
To: boston-pm@mail.pm.org
Subject: Re: [Boston.pm] Short time in Boston

On Thu, Sep 14, 2006 at 06:42:48PM -0400, Uri Guttman wrote:
  JA == John Abreau [EMAIL PROTECTED] writes:
 
   JA David H. Adler wrote:
So. Mom and I are taking a cruise next month up the east coast and into
Canada. We've got a day (22 Oct, if I've got this all right) in Boston.
What should we do in the... 10 hours we're there?
 
 i assume that is a day stop here? what hours?

Yep. I believe we dock at 8am and set sail (motor?) at 6pm.

[snip suggestions]
 
 another possible idea is an emergency pm social lunch.

This, of course, is a definite possiblity.

dha

-- 
David H. Adler - [EMAIL PROTECTED] - http://www.panix.com/~dha/
It's about hoodwinking the viewer in the cheapest and easiest manner
possible- Markku Pätilä

 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


[Boston.pm] Is there any security issue with *.pmc files?

2006-07-14 Thread Tolkin, Steve
I read Audrey's Tang blog and some things it linked to.  Great stuff.  I 
learned that *.pmc files have precedence over *.pm files.  Does this introduce 
a security issue, i.e. anything new beyond the existing risks?  I wonder if an 
evil *.pmc file might not even be noticed when searching for a problem, due to 
its unusual extension.  

Specifically, can the *.pmc file be in a different directory than the *.pm file 
that was intended to be used?


Hopefully helpfully yours, 
Steve 
-- 
Steve Tolkin    Steve . Tolkin at FMR dot COM   508-787-9006
Fidelity Investments   82 Devonshire St. M3L Boston MA 02109 
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates. 


 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


[Boston.pm] Perl and utf16 e.g. for Windows Registry file

2006-07-14 Thread Tolkin, Steve
Summary:  How to use Perl 5.8.0 to handle files encoded using utf-16 on
Windows?

Details:
I have read that perl 5.8 ought to handle utf-16 without me needing to
tell it anything.
But I am now getting the behavior I expect.
Specifically, I want to find what changed in a Registry after I install
a program.
So I export the whole Windows Registry to a *.txt file.  This file is
written using utf16 (technically utf-le because Intel in little endian).
Then I install the program, and export the Registry again to a second
file.
These files are very large, over 100 MB.  So the port of diff.exe to
Windows quickly dies, saying 
diff: memory exhausted

I then tried diff.pl (which uses diff.pm) and watched the memory usage
slowly grow to over 100 MB; I never got any output.  So I decided to
reduce the number of lines in the file by removing all the binary data
(which in the text file is plain text, matching this pattern: ^\d{8}   


However the following command line perl program fails, in that it emits
every input line to the output.  I suspect this problem is caused by the
fact that the file is UTF16.

perl -ne print if ! m/^\d{8}/ reg1.txt  reg1_reduced.txt

Note: \d is equivalent to [0-9]  -- using that failed also.

I then tried to include the NUL bytes and used this
perl -ne print if ! m/^[0-9\000]{8}/ reg1.txt  reg1_reduced.txt
But that somehow caused the new lines to disappear.  

So I am asking for help.

Thanks,
Steve
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] Put similarities in code and differences in data

2006-04-04 Thread Tolkin, Steve
I understand Uri's point, and can almost understand the silliness, but I
think there really is more often a benefit to putting similarities in
code and differences in data rather than vice versa.

The following quote makes a similar point, but it is not exactly the
same point.
Eric S. Raymond, The Art of Unix Programming p 47 online at
http://www.faqs.org/docs/artu/ch01s06.html and many other places

Rule of Representation: Fold knowledge into data, so program logic can
be stupid and robust.  Even the simplest procedural logic is hard for
humans to verify, but quite complex data structures are fairly easy to
model and reason about. ...  Data is more tractable than program logic.
It follows that where you see a choice between complexity in data
structures and complexity in code, choose the former. More: in evolving
a design, you should actively seek ways to shift complexity from code to
data.


Another related idea is this: To reuse code you have to change the
data (my paraphrase of a quote in
http://groups.google.com/group/comp.object/browse_frm/thread/2ebcb9c6cf8
6bf9f/318ede5cf4a01220?tvc=1q=%22in+data%22+%22in+code%22+invariant+OR+
invariants+OR+mellorhl=en#318ede5cf4a01220 

The difference is that I am trying to find a quote that focuses on the
benefits of using data in a special way, as control data, to determine
the specific execution path taken by the code.


Thanks,
Steve

-Original Message-

Tolkin, Steve wrote:
 I am looking for the best and/or original wording of this
programming
 maxim: Put similarities in code and differences in data

 Google found this in a perl discussion
 capture similarities in code, differences in data
 http://blog.gmane.org/gmane.comp.lang.perl.fun/month=20031001
 So I am posting to this list.

 Here is a hit on a similar quote putting invariants in code and
 differences in data.

http://groups.google.com/group/comp.object/browse_thread/thread/1dc6f6dd

db34dc18/cdfb5eae936861f2?lnk=stq=%22differences+in+data%22+%22in+code%
 22rnum=3hl=en#cdfb5eae936861f2
 This mentions Mellor is passing -- Is he the original person behind
 this?

 Hopefully helpfully yours,
 Steve
   
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] Put similarities in code and differences in data

2006-04-04 Thread Tolkin, Steve
Thank you Charlie.  That is the idea I am trying to get across.  Do you
have any suggestions about how to get developers to see the benefits of
writing programs this way?  Any specific books, techniques, etc.?  Any
pitfalls to be aware of?

Thanks,
Steve
-- 
Steve TolkinSteve . Tolkin at FMR dot COM508-787-9006
Fidelity Investments   82 Devonshire St. M3L Boston MA 02109
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates.


Steve

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Charlie Reitzel
Sent: Tuesday, April 04, 2006 9:18 AM
To: boston-pm@mail.pm.org
Subject: Re: [Boston.pm] Put similarities in code and differences in
data


Not really.  I believe it is intended to mean data driven programming
as 
Jeremy mentioned earlier.  To me, data driven programming means use
lotsa 
lookup tables, the contents of which are user tweakable.  As simple as
it 
sounds, it can be an effective technique to let you quickly adapt a
system 
as requirements evolve - without code changes.

Having found this hammer early in my programming career, I find a great 
many nails.  Early days in any new design are spent setting up a lookup

table table, along with utility routines for reporting, validation, UI 
picking values (one or several), etc.

It may be a use case, but I don't think this is quite the same thing as
the 
subject of this thread which, as Uri says, is a general approach to
analysis.

At 09:00 AM 4/4/2006 -0400, [EMAIL PROTECTED] wrote:
hi

( 06.04.04 08:46 -0400 ) Tolkin, Steve:
  The difference is that I am trying to find a quote that focuses on
the
  benefits of using data in a special way, as control data, to
determine
  the specific execution path taken by the code.

um, isn't this the scientific method?

--
\js oblique strategy: how would you have done it?

___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


[Boston.pm] Changing compiler from VC98 to Visual C++ Toolkit 2003

2006-03-04 Thread Tolkin, Steve
I used to have use Visual C compiler from 1998, aka VC98.  I have
compiled some Perl XS modules with it.  When I got a new PC it did not
have that old compiler on it.  I copied my Perl directories over, and
they seem to work.  I just downloaded the free (as in beer) Visual C++
Toolkit 2003 from Microsoft.

Q1. Can I use it to compile new XS modules without problems?

Q2.  Should I recompile all the existing XS modules?

perl -v 

This is perl, v5.8.7 built for MSWin32-x86-multi-thread
(with 7 registered patches, see perl -V for more detail)

Copyright 1987-2005, Larry Wall

Binary build 813 [148120] provided by ActiveState
http://www.ActiveState.com
ActiveState is a division of Sophos.
Built Jun  6 2005 13:36:37  

Thanks,
Steve

 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] script to normalize output of Windows dir command

2005-09-26 Thread Tolkin, Steve
Ben Tilly asked:
Are you reinventing the rsync wheel?

No.  I actually use the freeware version program syncback at
http://www.2brightsparks.com/downloads.html to do backup, and I think it
uses rsync (or similar) internally.  But I do not just want to do a full
restore.  I want to see what will be happening first.   

I think I can run syncback in a quiet mode that shows what would happen,
but not actually do it.

I still want to be able to see the differences between (portions of) two
file systems, based on various criteria, including date, size,
directory, etc.


Steve

-Original Message-
From: Ben Tilly [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 23, 2005 5:52 PM
To: Tolkin, Steve
Cc: Jeremy Muhlich; boston-pm@mail.pm.org
Subject: Re: [Boston.pm] script to normalize output of Windows dir
command


On 9/23/05, Tolkin, Steve [EMAIL PROTECTED] wrote:
 I do have a port of Unix find on my current Windows machine.
 But I do not have that on the machine I back up to (my wife's), so I
 would need to install that, and its dependencies, which makes me
 reluctant to take that approach.

Are you reinventing the rsync wheel?

(Yeah, I know.  Getting the flags right can be a pain.)

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] Quotes and such [was] RE: script to normalize output of Windows dir command

2005-09-26 Thread Tolkin, Steve
Actually, the original poster (me) very trying to solve a different problem.
I clearly specified that what was wanted is a perl program to convert the 
output of the Windows dir command into a structured text format suitable for 
use with sort and/or loading into a database. 

This would let me see what will be impacted by a partial restore.
It also has the benefit of not needing anything installed on my wife's machine 
(which is the target of the backup) -- not rsync, not find, not even perl.

Having a canonical format for file information also allows comparison with 
the list produced by many other programs, e.g. ls, find, Sequoia View, Wilbur, 
any other backup program, etc.  My suggested format was:

Path|file|extension|Dir_or_File|bytes|date|time

e.g.

C:\_from_laptop\AAA BBB_files|empty.jpg|txt|Dir|0|2003-04-14|23:00

So the natural sort works as desired and it is also easy to be a timestamp 
based sort.

I continue to think about that original problem.  I realize that I should 
probably force bytes to 0 if type is Dir.
The program probably should have an option to change between slash and 
backslash, and possibly suppress the drive letter.

I might actually write this program one day; if I do I'll post it here for 
feedback.


P.S. I tried to find a version of rsync for Windows that does not require 
cygwin.  Is there one?


Hopefully helpfully yours,
Steve
-- 
Steve TolkinSteve . Tolkin at FMR dot COM   617-563-0516 
Fidelity Investments   82 Devonshire St. V13CBoston MA 02109
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates.

-Original Message-
From: Chris Devers [mailto:[EMAIL PROTECTED] 
Sent: Monday, September 26, 2005 2:16 PM
To: John Macdonald
Cc: boston-pm@mail.pm.org; Ricker, William; Tolkin, Steve
Subject: Re: [Boston.pm] Quotes and such [was] RE: script to normalize output 
of Windows dir command


On Mon, 26 Sep 2005, John Macdonald wrote:

 On Mon, Sep 26, 2005 at 12:48:08PM -0400, Ricker, William wrote:
  Chris Devers was however obviously looking for this rather specific 
  elaboration of Santayana's, as it captures the inevitableness.
  
  [ Any sufficiently complicated c or fortran program contains an ad hoc
  informally-
  [ specified, bug-ridden, slow implementation of half of Common Lisp.
  [  -Greenspun's 10th law of programming 
  [ http://philip.greenspun.com/bboard/q-and-a-fetch-msg?msg_id=000tgU
  
  Note - there are no laws (1..9).
 
 Actually, I think he was looking for Henry Spencer's old quote:
 Those who do not understand Unix are doomed to reinvent it
 - badly.
 
Either of those, actually :-)

AS I say, I'm sure there's some witty nugget of a reformulation of those 
lines based around this thread and rsync -- the Unix variant is nice and 
succinct, while the Lisp one gets more specific -- but I can't be 
bothered to tease it out. 

In any case, the point stands: the original poster was looking for a way 
to solve a problem in Perl that rsync already has tackled. Perl is a 
nice tool and suitable for many purposes, but there are limits beyond 
which even the roundest of reinvented wheels can get no rounder, and 
rsync is clearly the roundest wheel for this job :-)



-- 
Chris Devers

ÝSB½ÚF5†{Dp
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

[Boston.pm] script to normalize output of Windows dir command

2005-09-23 Thread Tolkin, Steve
Summary:
I would like a perl script that converts the output of the Windows dir
command so that each line has the same format, including the directory
it is in, and its extension.  The date and time should use a format that
can be sorted as a string, e.g. -mm-dd and a 24 hour clock
I think pipe delimited would work best, as the pipe character | cannot
appear in a file name, and that would let me sort the output, and/or
load it into a database.

Details:
I could probably write this in an hour but laziness is a virtue, and if
someone has got one already that will probably be better anyway.
I want to translate lines like this:

 Directory of C:\_from_laptop\AAA BBB_files

04/14/2003  10:21 AM   123 abc
04/14/2003  11:00 PM 0 empty.jpg.txt

To lines something like this.  Note that I moved the file name and
extension sooner, so that the natural sort is by directory and file
name, and a sort on the last two fields is by time.  (I have a port of
Unix sort in my c:\bin\ directory that I can use.)

C:\_from_laptop\AAA BBB_files|abc||File|123|2003-04-14|10:21
C:\_from_laptop\AAA BBB_files|empty.jpg|txt|Dir|0|2003-04-14|23:00


None of it is tricky.  You just need to remember what Directory line you
saw last, convert the date and time fields, insert either File or Dir
depending on its type, and write out each line that comes from a file or
dir (except skip all the . and .. dirs).  Note that a file named
foo.bar.txt has a name of foo.bar and extension of txt.  Some files can
have no extension, and some directories do have an extension.

Here is an except of the output.  (Because it is an except the totals
for Files and Bytes are not right.)
Note that there are a few lines of boilerplate at the beginning which
can be ignored, and a few lines at the end which can be ignored (or used
as a sanity check on the totals.)  Note that a file might not have an
extension, that a file or directory can be empty, can have white space
and strange characters in its name.

 Volume in drive C has no label.
 Volume Serial Number is A898-B50D

 Directory of C:\_from_laptop

01/23/2005  08:37 AMDIR  .
01/23/2005  08:37 AMDIR  ..
04/14/2003  01:46 PMDIR  _from_c
02/06/2001  01:34 PM 15618 0101.txt
02/06/2001  01:34 PM 15618 abc
04/14/2003  10:22 AM 32451 AAA BBB.htm
01/17/2005  09:53 AMDIR  AAA BBB_files
04/04/2000  06:14 PM 27648 acm_pubform.doc
01/17/2005  09:53 AMDIR  acrobat
01/17/2005  09:54 AMDIR  address
08/17/2004  10:04 AM 0 zzz
 650 File(s)   92010877 bytes

 Directory of C:\_from_laptop\AAA BBB_files

01/17/2005  09:53 AMDIR  .
01/17/2005  09:53 AMDIR  ..
04/14/2003  10:21 AM  1045 abc
04/14/2003  10:21 AM  0 empty.jpg.txt
04/14/2003  10:22 AM 32451 AAA BBB CCC.htm
01/17/2005  09:53 AMDIR  AAA BBB_CCC_files
04/14/2003  10:21 AM43 spacer.gif
  11 File(s)  37476 bytes

 Directory of C:\_from_laptop\AAA BBB CCC_files

01/17/2005  09:53 AMDIR  .
01/17/2005  09:53 AMDIR  ..
   0 File(s)  0 bytes

 Total Files Listed:
   245909 File(s)28969650933 bytes
   154376 Dir(s) 31272304640 bytes free


Background:
My laptop's died a few days ago.  The process to recover files and
directories from it seems to have lots of missing files.  I have a
directory on another machine that I have been backing up to.  I want to
find out which file are missing.  I have run dir on the backed up
machine, and will run dir on the new machine, and then diff the outputs.
The diff will work best if each line in the file had the same format,
and includes the full directory path.

P.S. Here is the command I ran in a DOS box (aka command prompt window
etc.) from my Windows XP machine.

dir dir.txt c:\_from_laptop /-C /ON /S /TW /4

The /-C means suppress the thousand separator in the size, /ON means
order by name, /S means recurse into subdirectories, /TW means show the
last time it was written, and /4 means show 4 digit years.



Thanks,
Steve
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] script to normalize output of Windows dir command

2005-09-23 Thread Tolkin, Steve
I do have a port of Unix find on my current Windows machine.
But I do not have that on the machine I back up to (my wife's), so I
would need to install that, and its dependencies, which makes me
reluctant to take that approach.

I, like many people, have had problems with find, but I thought I would
try your suggestion.  There are quirks with the time reporting, and
probably other issues I have forgotten.  I do not know exactly how to
set the argument to -printf and it is not explained in the help (shown
below).  If you send an example I would try that.

Here are a few lines from the output of
\bin\find -print -ls

 945730 drwxr-xr-x   6 a071046  Administ0 Sep 21 15:05 ./ant
./ant/bin
 951240 drwxr-xr-x   2 a071046  Administ0 Sep 21 15:05
./ant/bin
./ant/bin/ant
 951283 -rwxr-xr-x   1 a071046  Administ 5140 Apr 16  2003
./ant/bin/ant
./ant/bin/ant.bat

Note each file is on two lines.  Probably that is the default for -ls.
Also date and time are combined into three fields, but the third is
either time or year.  This makes it harder to process.  I would actually
prefer time in seconds since the start of the Unix eon.
Also there is no easy way to distinguish Files from Directories except
by further parsing of the permissions string, e.g. drwxr-xr-x.

Here is the help.  I cannot figure out how to suppress certain useless
fields e.g. inode and owner, nor put output on one line, etc.  

C:\foo\bin\find -help
Usage: /bin/find [path...] [expression]
default path is the current directory; default expression is -print
expression may consist of:
operators (decreasing precedence; -and is implicit where no others are
given):
  ( EXPR ) ! EXPR -not EXPR EXPR1 -a EXPR2 EXPR1 -and EXPR2
  EXPR1 -o EXPR2 EXPR1 -or EXPR2 EXPR1 , EXPR2
options (always true): -daystart -depth -follow --help
  -maxdepth LEVELS -mindepth LEVELS -mount -noleaf --version -xdev
tests (N can be +N or -N or N): -amin N -anewer FILE -atime N -cmin N
  -cnewer FILE -ctime N -empty -false -fstype TYPE -gid N -group
NAME
  -ilname PATTERN -iname PATTERN -inum N -ipath PATTERN -iregex
PATTERN
  -links N -lname PATTERN -mmin N -mtime N -name PATTERN -newer FILE
  -nouser -nogroup -path PATTERN -perm [+-]MODE -regex PATTERN
  -size N[bckw] -true -type [bcdpfls] -uid N -used N -user NAME
  -xtype [bcdpfls]
actions: -exec COMMAND ; -fprint FILE -fprint0 FILE -fprintf FILE FORMAT
  -ok COMMAND ; -print -print0 -printf FORMAT -prune -ls

Thanks for the suggestion, but it is probably faster to write the perl
that use find.
Steve


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jeremy Muhlich
Sent: Friday, September 23, 2005 12:19 PM
To: boston-pm@mail.pm.org
Subject: Re: [Boston.pm] script to normalize output of Windows dir
command


How about the unix find command, with the -printf option?  You can get
it through cygwin.  Taking find's output (even without -printf) from two
directories and diffing it has gotten me through most of these sorts of
problems.

Also, diff -r might be helpful.  (possibly with the --brief option as
well)


 -- Jeremy


On Fri, 2005-09-23 at 11:55 -0400, Tolkin, Steve wrote:
 Summary:
 I would like a perl script that converts the output of the Windows dir
 command so that each line has the same format, including the directory

 C:\_from_laptop\AAA BBB_files|abc||File|123|2003-04-14|10:21
 C:\_from_laptop\AAA BBB_files|empty.jpg|txt|Dir|0|2003-04-14|23:00


 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] Combining the nodes reachable in n steps from a web page into one printable file

2005-09-15 Thread Tolkin, Steve
Summary:
1. What might cause IO::Socket::INET-new to fail?
2. Is there a bundle for WWW-Mechanize?  

Details:
I went to http://search.cpan.org/dist/WWW-Mechanize/ and read the doc
and it looks promising.  
I downloaded the tar.gz file, extracted all its files, and started the
usual install process.
Unfortunately I hit a variety of problems.  Here is the output:

C:\perl_install\WWW-Mechanize-1.14perl makefile.pl

It seems that you are not directly connected to the Internet.  Some
of the WWW::Mechanize tests interact with websites such as Google,
in addition to its own internal tests.

Do you want to skip these tests? [y] y
Do you want to install the mech-dump utility? [y] y

It looks like you don't have SSL capability (like IO::Socket::SSL)
installed.
You will not be able to process https:// URLs correctly.


WWW::Mechanize likes to have a lot of test modules for some of its
tests.
The following are modules that would be nice to have, but not required.

Test::Pod
Test::Memory::Cycle
Test::Warn


Checking if your kit is complete...
Looks good
Warning: prerequisite LWP::UserAgent 2.024 not found. We have 1.004.
Warning: prerequisite Test::LongString 0 not found.
Warning: prerequisite URI 1.25 not found. We have 1.19.
Writing Makefile for WWW::Mechanize
 
//

I *am* directly connected to the Internet, so the first warning is
probably caused by a proxy problem. 
Looking inside the Makefile.PL I think the specific test that failed is:

if ( !$skiplive ) {
require IO::Socket;
my $s = IO::Socket::INET-new(
PeerAddr = www.google.com:80,
Timeout  = 10,
);

I think my proxy is set up correctly.
C:\perl_install\WWW-Mechanize-1.14env | grep -i proxy
FTP_PROXY=http://proxbos1.fmr.com:8000
HTTP_PROXY=http://proxbos1.fmr.com:8000

How can I learn more about why IO::Socket::INET-new failed?

The others errors are dependencies on other modules, or newer versions
of modules.
Is there a bundle for WWW-Mechanize?


Thanks,
Steve

-Original Message-
From: Ricker, William 
Sent: Wednesday, September 14, 2005 5:05 PM
To: Tolkin, Steve; L-boston-pm
Subject: RE: [Boston.pm] Combining the nodes reachable in n steps from a
web page into one printable file


Is this to implement the missing PRINTABLE PAGE button for just yourself
or as part of the website?

This sounds a lot like one of the examples in MDJ's new Higher Order
Perl book.

Outside of HOP, WWW::Mechanize is the new wrapper around LWP::Simple for
this sort of thing.

Makes my old LWP-wielding cache-and-smash implementation look lumpy ...

Bill


 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


[Boston.pm] Combining the nodes reachable in n steps from a web page into one printable file

2005-09-14 Thread Tolkin, Steve
This seems like a problem that would be easily solved with a small perl
script.

Many web pages have a large list of links.  
I would like to follow all the links, to some small depth (typically
just 1) and put their output into one file, in some format suitable for
printing.  
I am flexible about the order of the links, and the details of the
format, etc.
This has probably been written already. 
Having it in perl would let me modify it, which might be useful.
(If there is a reliable freeware or shareware program, I would also be
interested in that.)


Thanks,
Steve

P.S.  perl -v says:
This is perl, v5.8.0 built for MSWin32-x86-multi-thread
(with 1 registered patch, see perl -V for more detail)

Copyright 1987-2002, Larry Wall

Binary build 805 provided by ActiveState Corp.
http://www.ActiveState.com
Built 18:08:02 Feb  4 2003

 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


Re: [Boston.pm] Geo::Coder::US RE: GoogleGeoCoder

2005-06-17 Thread Tolkin, Steve
This is a true story.  About 15 years ago I moved to Newton, and my zip
code was 02148.
All was well.  Then one day about 10 years ago, the USPS decided that
the preferred name of my town (Post Office) was Waban.  It changed
Newton to the alternate name.  Unfortunately on that very day they
also decided to change the zip code!  (This was part of a wholesale
renumbering of many towns.)

Every data analyst knows it is not a good idea to change the identifier
of an entity.
It is extremely bad to change all of its identifiers at once.

In theory companies are supposed to subscribe to the USPS lists, which
do mark the changes.
In theory companies are supposed to allow either the preferred or
alternate name.
In practice some only allow the primary.  In practice some do not bother
to subscribe,
or do not have a reliable system to process the updates.

At many web sites (big ones including I recall CNN, eBay, Amazon, NY
Times, etc.) I got a wide variety of problems.  So said Newton did not
exist, or that Waban did not exist, or that my zip code did not match my
city, etc.
On one site that I really wanted to use I actually tried all 4
combinations without success.
I can only conjecture that their system had some subtle flaw -- perhaps
it had not been coded to handle a town whose name and zip code changed
simultaneously, and so it just deleted it from the database.

I have now gone more than a year since this problem has occurred, and so
I think the various web sites may have all caught up.

P.S.  The USPS did this for several, but not all, of the villages of
Newton.
This is of no benefit to the people who live there.  It is a service
that the U.S. Post Office provides at the request of marketers, who want
to be able to easily distinguish prestige addresses by the city name.


Hopefully helpfully yours,
Steve
-- 
Steve TolkinSteve . Tolkin at FMR dot COM   617-563-0516 
Fidelity Investments   82 Devonshire St. V4D Boston MA 02109
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates.






-Original Message-
From: Chris Devers [mailto:[EMAIL PROTECTED] 
Sent: Friday, June 17, 2005 11:34 AM
To: Joel Gwynn
Cc: Boston.PM; [EMAIL PROTECTED]
Subject: Re: [Boston.pm] Geo::Coder::US RE: GoogleGeoCoder


On Thu, 16 Jun 2005, Joel Gwynn wrote:

 When you get right down to it, this Boston neighborhood thing is 
 just confusing.  I work in Dorchester but management likes to put 
 Boston on the stationary, which is confusing because there's an 
 identical address in Boston proper, just with a different zip code. 
 Are there any other cities that have similar naming schizophrenia?

Sure, I imagine it happens all over the place. 

As has been noted in other comments in this thread, big towns assimilate

smaller towns all the time, so current neighborhood names are often the 
names of formerly independent political entities. 

But then, it's not even always assimilation. People all over the world 
know that Harvard Square is in Cambridge, Massachusetts, but it isn't, 
as far as I know, a formal geographic boundary in any useful sense -- 
it's just a district in that part of Cambridge. But then maybe I'm 
revealing some ignorance here, as I've lived in the Boston area since I 
was a kid and yet I still don't actually know what square is really 
meant by the trm Harvard Square -- I've always assumed that it's 
centered on the T station, but that's not actually on Harvard's campus, 
hence the ambiguity. 

At $past_job, some of my coworkers were working on a real estate site. 
For this, they had to be able to handle all kinds of random input from 
people that, whether or not it was on any formal map, did in fact denote

a perfectly well understood geographic area. 

Harvard Square. Union Square. Mark Sandman Square. Financial District. 
Theatre District. Leather District. Back Bay. Fort Point. South End. 
World's End. Greenbush. Queen Anne's Corner. Four Corners. Assinippi. 
Minot. Humarock. Silver Lake. Cedarville. Just to name a few.

All of these are definite places in or around Boston or southeastern 
Massachusetts, but none of them is an actual town or city. But if you 
put any of them on an envelope, the mail will very probably get to its 
intended destination, and if you put any of them into a search string on

a real estate site, it has to return results for that area.

My impression is that dealing with all these varying names for the same 
places was the main impetus for setting up the ZIP code system in the 
first place. As long as you have the right ZIP code on an envelope, you 
can call your neighborhood Fatty Arbuckle for all the post office cares.


Heh. Come to think of it, I might start calling my street that... :-)
 


-- 
Chris Devers
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

RE: [Boston.pm] THE NAZIS HAD A CERTIFICATION FOR PERL

2005-03-01 Thread Tolkin, Steve
Right.  The horse is dead.  Please stop beating it.

Dear Ronald, as our fearless leader 
will you please ask everyone to stop all these threads
on certification and advocacy.


Now I know why there are literally millions of matches in Google.
This topic draws in people like flies to 


Hopefully helpfully yours,
Steve
-- 
Steve TolkinSteve . Tolkin at FMR dot COM   617-563-0516 
Fidelity Investments   82 Devonshire St. V4D Boston MA 02109
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates.



-Original Message-
From: Chris Devers [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, March 01, 2005 2:50 PM
To: Boston Perl Mongers
Subject: [Boston.pm] THE NAZIS HAD A CERTIFICATION FOR PERL


But then, you can't invoke Godwin deliberately, can you?

Wasn't mentioning [implicitly, national] socialism close enough?

No?

Damn.


-- 
Chris Devers, fascinated just how many thousands of words this thread 
has produced, and yet managed to clarify exactly nothing while doing so
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


RE: [Boston.pm] short-listing languages for applications software development

2005-02-25 Thread Tolkin, Steve
I think this is the best point that has been advanced in favor of using
perl:
Amazon, Google, Yahoo, Morgan Stanley all use Perl in production ...

Does anyone have additional details, e.g. the names of the projects,
number of servers, number of users, estimated cost, estimated savings by
using perl, etc.

This is basic information that should be available to Perl advocates,
i.e. easily findable at http://www.perl.org/advocacy/ which
unfortunately
does not have anything of the sort.


Hopefully helpfully yours,
Steve
-- 
Steve TolkinSteve . Tolkin at FMR dot COM   617-563-0516 
Fidelity Investments   82 Devonshire St. V4D Boston MA 02109
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates.


-Original Message-
From: Ranga Nathan [mailto:[EMAIL PROTECTED] 
Sent: Thursday, February 24, 2005 9:06 PM
To: boston-pm@pm.org
Subject: Re: [Boston.pm] short-listing languages for applications
software development


I met that person and discussed about the richness or perl data 
structures. He was adamant that perl did not have strong typing. I told 
him that perl is intelligent and  would guess the data type.
What the heck? In business applications I have hardly come across
anything 
more than a = b + c ! 95% what we handle are strings. Which is the most 
preferred language for strings? 

Also, he said that perl code looked confusing! Well everything requires 
some getting used to. But I know a lot of COBOL programs that are
utterly 
confusing. Requiring 'system.out.println' could be confusing for someone

not used objects at all.

It went on for some time but neither of us convinced the other.  But I
did 
tell him that Amazon, Google, Yahoo, Morgan Stanly all use Perl in 
production and in fact we are using perl in mission-critical production.

We had problems but it had nothing to do with perl or the architecture!


__
Ranga Nathan / CSG
Systems Programmer - Specialist; Technical Services; 
BAX Global Inc. Irvine-California
Tel: 714-442-7591   Fax: 714-442-2840

 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


RE: [Boston.pm] (also) Perl

2005-02-25 Thread Tolkin, Steve
Well just about everything that can be said on this thread has been
said, except for this.

Google for: perl (certification OR certificate)  produces 2170
matches.
This matches two phrases.  If you remove the quotes, i.e. 
Google for: perl (certification OR certificate)
produces 1.2 million hits.

Among them is this, from the Perl Journal
http://www.tpj.com/documents/s=1131/sam05040001/letters.htm?temp=NJykmWt
Eip
which says in part:

I was wondering if you knew of anyone that offers a Perl Certification
Program?
...
At the second O'Reilly Perl conference, Mark-Jason Dominus, Nathan
Torkington, and I sold Perl Certificates. You named a title (Perl
Monger, Perl Studmuffin, and Perl Sultan were all chosen), and an
Official Perl Certification was immediately printed for you to take home
and frame. To receive a certificate, you needed to show no
qualifications other than the ability to open up your wallet and fork
over $2. (This is like other certification programs, but cheaper.)
[You can read the rest if you want.]

Hopefully helpfully yours,
Steve
-- 
Steve TolkinSteve . Tolkin at FMR dot COM   617-563-0516 
Fidelity Investments   82 Devonshire St. V4D Boston MA 02109
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates.


___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm


RE: [Boston.pm] a car talk puzzle

2004-09-08 Thread Tolkin, Steve
See my answer after the original message.
It uses Perl, but the minimum amount.
It only took a few seconds to do it the natural way 
(natural if you are used to grep, comm and other Unix utilities).
I used these utilities in part because Chris suggested their use,
and in part because I think this is the quickest way to solve the
problem in programmer time.

Steve

-Original Message-
From: Chris Devers [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 08, 2004 2:29 PM
To: Boston Perl Mongers
Subject: [Boston.pm] a car talk puzzle


This seems like something that would be fun to solve with Perl:

 RAY: I have, written on a piece of paper in front of me, a word
that
 is plural and also masculine. Now, I know we don't have masculine
and
 feminine words in English the way we do in Italian or French. But,
 we do have words that connote masculinity. For example, the word
 boys is a plural word that connotes masculinity.

 The word I have written here is like boys. It's masculine, and
 ends in s. Not only that, but you change this word from plural to
 singular and from masculine to feminine, all by adding an s to
it!

 I spent last night reading the entire Oxford English Dictionary,
 and I only found one example for which this works.

Ok, so I've got a word list, how many words can there be that end in S?

 $ grep -ic 's$' /usr/share/dict/words
 25998

Oy, way too many. But how many end in a double S?

 $ grep -ic 'ss$' /usr/share/dict/words
 9552

Better, but not much better.

If the word in question is in /usr/share/dict/words, then it should be 
one of the (hopefully) rare words that is a -ss word that, when the last

-s is dropped, is also in the larger -s list.

With luck, there will be only one; realistically, this should shorten 
the list enough that the answer can be found manually.

Can anyone think of a clever way to do this ?



-- 
Chris Devers
___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pmHere 


Here is my deliberately non-clever solution.
Note that I am running these Unix utilities in my DOS box;
I got them from http://unxutils.sourceforge.net/


C:\wordlistsgrep ss$ words.txt  o1
C:\wordlistsgrep [^s]s words.txt o2
C:\wordlistsperl -ne chomp; print $_ . qq(s\n) o2 | sort  o3
C:\wordlistscomm -12 o1 o3  o4
C:\wordlistswc o4
  5   5  30 o4

C:\wordlistscat o4
ass
buss
canvass
discuss
hiss

Oh well.  It looks like my version of /usr/dict/words 
(which I named words.txt) did not have the answer.
So I ran the same sequence of steps with a bigger word list,
the yawl.lst (yet another word list) which is very large.
It can be downloaded from
http://personal.riverusers.com/~thegrendel/software.html
and other places.

C:\wordlistsgrep ss$ yawl.lst o1
...
C:\wordlistswc o4
127 1271007 oo4

Eyeballing the list I come up with the following answer.
Warning!! spoiler below, do not hit page down unless you want to see it






































millionairess

Clearly millionairess is feminine and singular and
I think that millionaires does have a masculine connotation.


Hopefully helpfully yours,
Steve
-- 
Steve TolkinSteve . Tolkin at FMR dot COM   617-563-0516 
Fidelity Investments   82 Devonshire St. V4D Boston MA 02109
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates.



___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm


RE: [Boston.pm] I want a compile time check on missing parens in regex

2004-07-21 Thread Tolkin, Steve
Summary:
What is the scope of $1 and when does it get reset?

Details:
Thanks for the reply, Ron.
It indicates that I understand this even less than I thought.

What are the rules for remembering a previous value of $1 
(and the other numeric variables set by pattern matching)?

In the program where I discovered the problem I have a bunch of
regexes, and so there could have been a value for $1 in effect.
But I got a warning message anyway.
Why wasn't that earlier value of $1 used?
Or was I used, and I only got the warning where there wasn't a value for
$1.  

Does the zero length string (aka null string) act as a previous value of
$1?


Thanks,
Steve

-Original Message-
From: Ron Newman [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 21, 2004 11:52 AM
To: Tolkin, Steve
Cc: [EMAIL PROTECTED]
Subject: Re: [Boston.pm] I want a compile time check on missing parens
in regex


If I intend to write something like
s/([ab])c/$1c/;
but accidentally omit the parentheses and write
s/[ab]c/$1c/; 
I get a run time error message -- assuming
the pattern matches the input data.
But if the test data does not expose
this bug I might not find out about it until later.

Is there any way to get a compile time check?

That's not possible in general, because there could legitimately be a $1
left
over from a previous regex match.

___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm


RE: [Boston.pm] I want a 'compile time' check on missing parens in regex

2004-07-21 Thread Tolkin, Steve
OK, here is the answer:
http://www.perldoc.com/perl5.6.1/pod/perlre.html says:
The numbered variables ($1, $2, $3, etc.) and the related punctuation
set ($+, $, $`, and $') are all dynamically scoped until the end of the
enclosing block or until the next successful match, whichever comes
first. 

and 5.8.4 is the same except adding $^N (whatever that is).

So it is not possible in Perl 5.

Note that these numbered variables are somewhat like
global variables, and go do action at a distance.

Is there going to be a way in perl 6 to control this better?


Steve

-Original Message-
From: Greg London [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 21, 2004 12:30 PM
To: Tolkin, Steve
Cc: [EMAIL PROTECTED]
Subject: RE: [Boston.pm] I want a 'compile time' check on missing parens
in regex



Tolkin, Steve said:
 What is the scope of $1 and when does it get reset?

here's a start:
http://www.greglondon.com/iperl/html/iperl.html#20_5_2_Capturing_parenth
eses_not_capturing

I suppose I should make a note to include some s/// examples...

note to self: self, add some s/// examples.

-- 
Impatient Perl
A GNU-FDL training manual for the hyperactive.
Free HTML/PDF downloads at greglondon.com/iperl
Paperback/coilbound available for $8.50+sh

___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm


[Boston.pm] FW: GBC/ACM Announcements

2004-06-11 Thread Tolkin, Steve
I believe that the technical portion of this, 
i.e. the talk on Parrot by Dan, is open to the public.
(But I have not checked.  Dan, do you know?)

Steve

-Original Message-
From: Kenneth Baclawski [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 10, 2004 11:03 PM
To: Tolkin, Steve
Subject: GBC/ACM Announcements


Announcements this month include:
Annual GBC/ACM Meeting and Election of Officers
June GBC/ACM Monthly Meeting
-
  
The Greater Boston Chapter of the ACM
   Annual Business Meeting
   Thursday, June 17, 2004
   MIT Room 34-101
   7:00 - 7:15 pm

- President nomination: Peter Carmichael who is currently VP and PDS
Brochure and Lecture Notes Editor; and PDS and Volunteer Committee
member.

- VP nomination: Jay Conne who is currently a member of the PDS and
Volunteer committees and is a former President, Membership Chair and PDS
Registrar.

- Secretary nomination: Ed Bristol who is the incumbent Secretary and
former President of the IEEE Control Society.

- Treasurer nomination: Yona Carmichael who is currently PDS Brochure
Editor and recently hosted a volunteer appreciation party at her and
Peter's home.  Yona is also Treasurer for the local chapter of the
Society for Creative Anachronism.

-

The Greater Boston Chapter of the ACM
will be having a Monthly Meeting on Thursday, June 17, 2004
   MIT Room 34-101, Cambridge, MA
 7:15 - 9:15 pm  (note time)

 Parrot: Structure and Building of a Virtual Machine
Dan Sugalski

Abstract: This is a two-part talk. In the first part we'll sketch a
broad outline of the architecture of Parrot, a virtual machine being
designed to efficiently run the so-called dynamic languages. (Primarily
Perl 5, Perl 6, Python, and Ruby)

In the second part of the talk we'll cover some of the techniques and
build tools we've developed as part of the process to abstract out the
building and platform-specific optimizing of the VM source. (Somewhere
between 75 and 80% of Parrot's source is preprocessed or autogenerated,
some of it quite significantly)

Dan is the lead designer of Parrot and past contributor to Perl. He's
currently employed writing compilers for a metals wholesaling company,
much to his surprise, and has written a number of articles and parts of
books on Perl and Parrot.

There will be a business meeting from 7:00 to 7:15 pm immediately 
preceding the talk.

Directions to MIT, building 34, room 101: MIT is located at 77
Massachusetts Avenue, just on the north side of Memorial Drive in
Cambridge, MA. The URL http://whereis.mit.edu contains a map of the
area.

___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm


RE: [Boston.pm] list viruses

2004-05-06 Thread Tolkin, Steve
I almost never open an attachment,
unless it comes from a known and trusted source,
and I am expecting an attachment.
This is an antivirus measure.

So unfortunately I never get to read the posts certain by certain 
people, e.g. Sean Quinlan, because for some reason their
posts become an attachment.

So I would like you to consider blocking email with attachments.
At a minimum this would encourage people to send plain old email.

However a better approach, if viable, is converting the attachment 
to plain text and pasting it inline.  Ideally this would
preserve the fact that it once was an attached file, and it
also the file's name.
This should work with all non-binary files,
I do not think there is any need to post binary files to this list.
I do not know enough about the programs that send mail or
intermediary programs, or the processing when mail arrives, 
to understand if this is possible or easy to do.


Hopefully helpfully yours,
Steve
-- 
Steve TolkinSteve . Tolkin at FMR dot COM   617-563-0516 
Fidelity Investments   82 Devonshire St. V4D Boston MA 02109
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates.


 -Original Message-
 From: Ronald J Kimball [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, May 05, 2004 9:55 PM
 To: Chris Devers
 Cc: Boston Perl Mongers
 Subject: Re: [Boston.pm] list viruses
 
 
 On Wed, May 05, 2004 at 09:25:08PM -0400, Chris Devers wrote:
  Okay, so two viruses have made it to the list today. In 
 both cases, it
  looks like the mail came from Verizon customers:
  
  Received: from pm.org (pool-141-154-212-242.bos.east.verizon.net
  [141.154.212.242])
  by mail.pm.org (8.11.6/8.11.6) with ESMTP id i45Joc914994
  for [EMAIL PROTECTED]; Wed, 5 May 2004 14:50:39 -0500
  
  Received: from pm.org (pool-141-154-222-33.bos.east.verizon.net
  [141.154.222.33])
  by mail.pm.org (8.11.6/8.11.6) with ESMTP id i460aa919816
  for [EMAIL PROTECTED]; Wed, 5 May 2004 19:36:36 -0500
  
  Boston.pm's mail is served by Mailman, right? Does Mailman 
 have a way to
  filter [presumably unsubscribed] incoming mail by network?
 
 These messages were both forged from addresses that are 
 subscribed to the
 mailing list, which is why they made it through.  Incoming mail from
 non-member addresses is already moderated.
 
 
  Going to a purely moderated list might be annoying for 
 whoever has to do
  it [maybe Ronald, maybe someone else].
 
 I have already turned on content filtering for the list.  
 This will remove
 unwanted attachments, but still sends the remainder of the message
 through.  (This is why the second message was missing its 
 payload.)  If
 that's not sufficient I can try rejecting all messages that contain
 attachments, but that will block some legitimate posts.
 
 
  Going to the pure Perl Siesta list manager software would be an
  interesting move, but I'm not sure if it's stable enough yet.
 
 That would be up to the pm.org sysadmins.
 
 
 Ronald
 ___
 Boston-pm mailing list
 [EMAIL PROTECTED]
 http://mail.pm.org/mailman/listinfo/boston-pm
 
___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm


[Boston.pm] Thanks Andrew for all the Perl Monger meetings you hosted at Boston.com

2004-04-13 Thread Tolkin, Steve
Title: Thanks Andrew for all the Perl Monger meetings you hosted at Boston.com






Dear Andrew,

 I wish to express my personal thanks for the work you

did in support of hosting the Boston Perl Monger meetings.


Thanks,

Steve Tolkin



___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm


[Boston.pm] using {3-8} instead of {3, 8} doesn't produce even a warning?

2004-01-27 Thread Tolkin, Steve
Title: using {3-8} instead of {3,8} doesn't produce even a warning?





# run using e.g. echo hello | perl this-file


# Why doesn't perl produce a warning from {3-8} ? This seems
# to be a syntax error. It surely is not the way to match strings of length 3 - 8. It
# should be {3,8} .


while () {
 if (/[a-z]{3-8}/) { 
 print;
 }
}



___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm


[Boston.pm] why no warning about this infinite loop

2004-01-27 Thread Tolkin, Steve
Title: why no warning about this infinite loop





# run using e.g. echo hello | perl this-file


# Why doesn't perl produce a warning from the following. It is an
# infinite loop. If I add a /g modifier to the m// it works fine.


while () {
 while (m/([a-z])/) { # warning infinite loop!!! 
 print $1, \n
 }
}
 


/// 


In general it is hard to detect infinite loops, but in this case it is easy,
because the pattern is a constant. I think this is a very common
special case, and is worth detecting.


Why isn't this done?


I am running perl 5.8


Steve



___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm


RE: [Boston.pm] using {3-8} instead of {3, 8} doesn't produce eve n a warning?

2004-01-27 Thread Tolkin, Steve
Thanks for the explanation.

So this is a documented feature.

I was fooled by believing the general principle that special
characters are special unless escaped with a backslash.
I would have greatly preferred consistency in this.

Are there other known (and perhaps even documented) violations
of that principle?  I scanned the 5.8 perltrap for curly
and this was not listed.  Who should I notify to request its inclusion?

Steve



 -Original Message-
 From: Ronald J Kimball [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, January 27, 2004 5:06 PM
 To: Tolkin, Steve
 Cc: [EMAIL PROTECTED]
 Subject: Re: [Boston.pm] using {3-8} instead of {3, 8} 
 doesn't produce even a warning?
 
 
 On Tue, Jan 27, 2004 at 04:55:28PM -0500, Tolkin, Steve wrote:
  # run using e.g. echo hello | perl this-file
  
  # Why doesn't perl produce a warning from {3-8} ?  This seems
  # to be a syntax error.  It surely is not the way to match 
 strings of length
  3 - 8.  It
  # should be {3,8} .
  
  while () {
  if (/[a-z]{3-8}/) { 
  print;
  }
  }
 
 perldoc perlre:
 
The following standard quantifiers are recognized:
 
*  Match 0 or more times
+  Match 1 or more times
?  Match 1 or 0 times
{n}Match exactly n times
{n,}   Match at least n times
{n,m}  Match at least n but not more than m times
 
(If a curly bracket occurs in any other context, it is 
 treated as a
regular character.)
 
 In other words, in Perl /[a-z]{3-8}/ is equivalent to /[a-z]\{3-8\}/.
 
 Ronald
 
___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm


RE: [Boston.pm] why no warning about this infinite loop

2004-01-27 Thread Tolkin, Steve
OK, My comments below apply to this and Uri's similar comments.

I should have said: this infinite loop is easy to detect because:
1. the pattern is constant
2. the data (here $_) is not modified in the loop

Both points are obvious to a person.
In this simple and important special case 
it is also easy for most compilers (of languages other
than Perl).
In principle quite complex code can be analyzed
to determine accurately that the data is not modified.

I conclude that the Perl compiler has either 
* chosen to not do this kind of analysis, or 
* any such analysis is not connected to the error mechanism.

I am curious if Dan S. has any comments on this w.r.t. Parrot.

Steve

 -Original Message-
 From: Ronald J Kimball [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, January 27, 2004 5:33 PM
 To: Tolkin, Steve
 Cc: [EMAIL PROTECTED]
 Subject: Re: [Boston.pm] why no warning about this infinite loop
 
 
 On Tue, Jan 27, 2004 at 05:04:03PM -0500, Tolkin, Steve wrote:
  # run using e.g. echo hello | perl this-file
  
  # Why doesn't perl produce a warning from the following.  It is an
  # infinite loop.  If I add a /g modifier to the m// it works fine.
  
  while () {
  while (m/([a-z])/) { # warning infinite loop!!! 
  print $1, \n
  }
  }
  
  
  ///  
  
  In general it is hard to detect infinite loops, but in this 
 case it is easy,
  because the pattern is a constant.  I think this is a very common
  special case, and is worth detecting.
 
 The pattern in the below code is also constant, but there is 
 no infinite
 loop:
 
 while () {
   while (m/([a-z])/) {
 print $1, \n;
 $_ = substr($_, 1);
   }
 }
 
 As you say, it is hard to detect infinite loops.  :)
 
 
 Ronald
 
___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm


RE: [Boston.pm] OT:Safari Bookshelf

2004-01-05 Thread Tolkin, Steve
Since you asked, I had a few specific criticisms also.

I was part of a pilot at my work place.

One major criticism I had is that is sent me my password 
*in the clear* as part of a routine reminder.
I replied that this is extremely bad practice.

In fact it should not even store my password, using
a Unix like approach of hash + salt.

Here is a sanitized version of my message sent
to '[EMAIL PROTECTED]'
last March.

P.S.  We did decide to sign up for the Safari service.


 -Original Message-
 From: Tolkin, Steve 
 Sent: Thursday, March 27, 2003 1:34 PM
 To: '[EMAIL PROTECTED]'
...
 Subject: Never sent user password in email -- this is a 
 serious breach of security
 
 
 Dear Safari,
   Your email to me included my password.
 This is a serious breach of security.
 
 Please tell me that you will fix this.
 
 I never do business with any organization that 
 sends out a password in email
 (unless explicitly requested by the user). 
 
 Thanks,
 
 Steve
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED]
 Sent: Thursday, March 27, 2003 5:28 AM
 To: [EMAIL PROTECTED]
 Subject: Time Flies
 
 
 Steve,
 
 How time flies! This note is just a friendly reminder that 
 you are half way through your free trial to Safari Tech Books Online.
 
 Log in today and let Safari pinpoint information for your 
 urgent IT questions. Safari's powerful search engine is far 
 more efficient than wading through piles of books and 
 articles and more effective than message boards or tracking 
 down colleagues for answers.
 
 As a reminder, your login URL is http://search.safaribooksonline.com/
 User Name: steve dot. tolkin at@ fmr dot. com
 Password: SHOULD NEVER SEND PASSWORD UNLESS REQUESTED!
 
 Need help getting started?  Join us for a quick LIVE tutorial.
 -- Every Tuesday
 -- 4:15 - 4:45 pm EST
... [rest of marketing blather snipped]

Hopefully helpfully yours,
Steve
-- 
Steve TolkinSteve . Tolkin at FMR dot COM   617-563-0516 
Fidelity Investments   82 Devonshire St. V4D Boston MA 02109
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates.



 -Original Message-
 From: Andy Oram [mailto:[EMAIL PROTECTED] 
 Sent: Monday, January 05, 2004 11:44 AM
 To: [EMAIL PROTECTED]
 Subject: Re: [Boston.pm] OT:Safari Bookshelf
 
 
 I guess I should stop lurking and say thanks for all the kind 
 comments.
 Anything special that any of you would like me to pass on to people I
 know on the Safari team? I haven't noticed any specific 
 criticism. Also,
 if you feel happy enough that you'd like to give a testimonial that we
 could use in marketing, let me know and I'll find a marketing 
 person to
 slurp it up.
 
 --
 Andy Oram  O'Reilly  Associates, Inc.email: [EMAIL PROTECTED]
 Editor 90 Sherman Street   voice: 617-499-7479
Cambridge, MA 02140-3233  fax: 617-661-1116
USA http://www.praxagora.com/andyo/
 Stories at Web site:
 The Bug in the Seven Modules Code the Obscure The Disconnected
 --
 
 ___
 Boston-pm mailing list
 [EMAIL PROTECTED]
 http://mail.pm.org/mailman/listinfo/boston-pm
 
___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm


[Boston.pm] Re: the XPath replace() function and regex patterns like s/^.../. ../g

2003-11-11 Thread Tolkin, Steve
Title: Message



One 
clarification. The suggested workaround was not to 
just
start 
the regex with a ^ but to start it with ^.*
I have 
also changed the body of the message below to reflect this.


  
  -Original Message-From: Tolkin, Steve 
  Sent: Monday, November 10, 2003 5:05 PMTo: 
  [EMAIL PROTECTED]Subject: [Unverified Sender] [Boston.pm] the XPath 
  replace() function and regex patterns like s/^.../.../g
  The proposed regex replace() function in XPath 2.0 
  (and also XQuery 1.0) always replaces all 
  matching strings, i.e. as if it had the g modifier in Perl's s///g 
  For details see http://www.w3.org/TR/xpath-functions/#func-replace 
  (It does define the semantics of overlapping 
  strings the same as perl.) 
  However it seems to me that always replacing 
  all the matching strings might cause 
  some loss in functionality, because there is no obvious way to get it to only do one replacement. The suggested workaround to achieve changing only the first 
  matching string is to put ^.* at the start of the pattern. 
  So I first ask a technical question, about Perl's 
  behavior. 
  Q1. Will a pattern such as s/^.../.../g 
  i.e. one that is anchored by a leading ^ 
  ever change more than one matching string? 
  Now a question about the real consequences of the 
  current XPath proposal. What is a good "use 
  case" for wanting a replace-one in addition to a replace-all? The best case I 
  can think of where this does cause a 
  problem is a pattern to preserve any leading whitespace (perhaps to keep the indentation the same) but replace all 
  other whitespace with a single 
  blank. The following perl _expression_ fails 
  to do this, s/^(\s*)(\S+)(\s+)/\1\2 
  /g and so I believe that it will be very 
  hard to do with replace(). 
  Q2. Can you think of a better "use 
  case"? 
  Assuming that there are serious problems 
  identified there are several ways to solve 
  this in XPath. 
  Q3. What is your preference? a. Have two functions with different names e.g. 
  replace-first() and replace-all() (if so 
  please choose your preferred names from the following set: For first: replace, replace-one, replace-first 
  For all: replace, replace-all 
  b. Change the default for replace() to mean 
  replace first, and add a flag named "g" to 
  mean replace all. (Note that there already 
  are flags named "s" and "m" with their perl meanings. I have access to a newer version of the spec than the one 
  that is posted.) c. Keep the default for 
  replace() as meaning replace all and add a new option (what letter?) meaning replace first. d. Something else 
  Any advice will be appreciated. 
  Hopefully helpfully yours, Steve -- 
  Steve Tolkin 
  Steve . Tolkin at FMR dot COM 617-563-0516 Fidelity Investments 82 Devonshire St. 
  V4D Boston MA 02109 There is nothing so practical as a good 
  theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. 
  
___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm


RE: [Boston.pm] Postal address De-duping

2003-08-06 Thread Tolkin, Steve
The article in question can be found at
http://www.foo.be/docs/tpj/issues/vol4_1/tpj0401-0002.html
(I had a hard time finding it via tpj.com, but Google worked.)

Unfortunately I think that the USPS site 
http://www.usps.com/cgi-bin/zip4/zip4inq
needed to run this script is no more.  
A search there for zip4inq produced nothing.

Does anyone know of a similar page, wither by the USPS or
another provider of (web) services?

Hopefully helpfully yours,
Steve
-- 
Steven Tolkinsteve . tolkin at fmr dot com   617-563-0516 
Fidelity Investments   82 Devonshire St. V4D Boston MA 02109
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates.



 -Original Message-
 From: Jon Orwant [mailto:[EMAIL PROTECTED] 
 Sent: Monday, August 04, 2003 6:15 PM
 To: Joel Gwynn
 Cc: [EMAIL PROTECTED]
 Subject: Re: [Boston.pm] Postal address De-duping
 
 
 
 On Monday, August 4, 2003, at 05:12  PM, Joel Gwynn wrote:
 
  Hey, all.  We do lots of (snail) mailings, and we're looking for a 
  fast,
  customizable de-duping solution.  We're currently taking a look at
  doubletake from http://peoplesmith.com/, which is not too 
 expensive, 
  but
  I was thinking there might be some perl stuff out there, 
 given perl's
  text-processing powers.
 
 There's a wee script I wrote for TPJ a while back that 
 scrapes the U.S. 
 Postal Service's address canonicalizer.  The script is on 
 tpj.com; look 
 under Archives for the article called Five Quick Hacks.  The 
 canonicalizer (well, they call it a zip code locator or something 
 like that) will transform variants on the same address into the One 
 True Address that the USPS recognizes, so de-duping then becomes a 
 matter of simple string matching.
 
 Won't help you for foreign addresses, obviously.
 
 -Jon
 
 ___
 Boston-pm mailing list
 [EMAIL PROTECTED]
 http://mail.pm.org/mailman/listinfo/boston-pm
 
___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm


RE: [Boston.pm] emacs discussion

2003-07-10 Thread Tolkin, Steve
As a long time emacs user I must agree
with the positions we have all been agreeing with:
* it has a long learning curve
* it has a lot of power

So I have a lot invested in it, and want to ensure 
emacs continues to survive, nay thrive.

Unfortunately I think its rate of adoption
is continually going down, as more people use
Windows and fewer use Unix.


I have configured my emacs to use the Windows keys.

;; Make the ctrl-c ctrl-v ctrl-x keys work like they don in Windows
;; 2003-03-17 I downloaded from http://www.cua.dk/cua.html Version: 2.10
(require 'cua)
(CUA-mode t)

However there is a BUG in emacs (or the documentation).
If you need to run a command that begins with
C-x you must hold the Shift key down while pressing Ctrl.

The other workarounds suggested in the CUA documentation did not work for
me:
Press the prefix key twice very quickly (within 0.2 seconds), 
press the prefix key and the following key within 0.2 seconds)


Does anyone gotten these two techniques to work?

Does anyone have other ideas to help ensure
the continued widespread use of emacs?

Steve
___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm


[Boston.pm] Perl 6 has become too complex

2003-03-14 Thread Tolkin, Steve
In Apocalyse 6 http://www.perl.com/pub/a/2003/03/07/apocalypse6.html
Larry Wall explains how subroutines are going to work in 
Perl 6.  I think this is the straw that broke the camel's back.
I think this is the worst case of second system syndrome I
have ever seen (See Jargon file e.g. at 
http://info.astrian.net/jargon/terms/s/second-system_effect.html ) and
I quote: When one is designing the successor to a relatively small,
elegant, and successful system, there is a tendency to become
grandiose in one's success and design an elephantine feature-laden
monstrosity.

I think the language design shows too much influence of Evil Damian.

I want good Damian to work with Larry el al. to reduce the
complexity of the language.  Or (shudder) a subset of the language to
be defined.

Please advise me as to how to proceed.


Hopefully helpfully yours,
Steve
-- 
Steven Tolkin  steve . tolkin at fmr dot com 617-563-0516 
Fidelity Investments   82 Devonshire St. V8D Boston MA 02109
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates.

___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm


RE: [Boston.pm] That's a Haiku. A freaky little perl Haiku.

2003-02-06 Thread Tolkin, Steve
Actually the best poetic form to feature the 
word autovivication
would seem to be the Double Dactyl see
http://lonestar.texas.net/~robison/dactyls.html
http://www.kith.org/logos/words/lower/d.html etc.
e.g. the self-describing

Higgledy-Piggledy
Dactyls in dimeter,
Verse form with choriambs
(Masculine rhyme):

One sentence (two stanzas)
Hexasyllabically
Challenges poets who
Don't have the time. 


Providing the other 7 lines is left as an exercise.

P.S Yes I know that the way autovivication
is prounounced normally is not quite a double dactyl,
but its close enough for this deliberately silly poetic form.

 
Hopefully helpfully yours,
Steve
-- 
Steven Tolkin  [EMAIL PROTECTED]  617-563-0516 
Fidelity Investments   82 Devonshire St. V4D Boston MA 02109
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates.




 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
 Sent: Thursday, February 06, 2003 1:44 PM
 To: [EMAIL PROTECTED]
 Subject: [Boston.pm] That's a Haiku. A freaky little perl Haiku.
 
 
 [EMAIL PROTECTED] wrote:
 
  Do What I Mean and
  Autovivification
  aren't what I wanted.
 
 Hm, though technically accurate in Joel's 
 situation, I think it would be better
 if I generalize it to be more universal,
 rather than worry about it being taken 
 out of context. Therefore:
 
 Do What I Mean and
 Autovivication
 can be unwanted
 
 Hey, I think I just got me a new signature file...
 
 Greg
 ___
 Boston-pm mailing list
 [EMAIL PROTECTED]
 http://mail.pm.org/mailman/listinfo/boston-pm
 
___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm



RE: [Boston.pm] Damian's Natural Language Parsing Meeting

2003-01-22 Thread Tolkin, Steve
Is this informatioon avaialble online,
e.g. in a Perl module, or an exegesis, etc.?
{I have read all the apocalypses and exegeses on Perl 6.)

I am interested in attending this meeting,
but would prefer to read this information first (or instead).
 
 
Hopefully helpfully yours,
Steve
-- 
Steven Tolkin  [EMAIL PROTECTED]  617-563-0516 
Fidelity Investments   82 Devonshire St. V8D Boston MA 02109
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates.


 -Original Message-
 From: James Freeman [mailto:[EMAIL PROTECTED]]
 Sent: Tuesday, January 21, 2003 10:54 PM
 To: [EMAIL PROTECTED]
 Subject: [Boston.pm] Damian's Natural Language Parsing Meeting
 
 
 Hi Folks,
 
 I have organized a meeting for Damian to speak to the bioinformatics 
 gurus in the local area.
 
 His Natural Language Parsing with a bioinformatics focus will be at 
 Boston University.
 
 Details below:
 
 http://informagen.com/NEBiG/
 
 Warmest Regards,
 
 Jim
 
 -- 
 Bioinformatics Consultant
 [EMAIL PROTECTED]
 voice:781-646-0742
 mobile:617-429-6352
 
 
 
 ___
 Boston-pm mailing list
 [EMAIL PROTECTED]
 http://mail.pm.org/mailman/listinfo/boston-pm
 
___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm



RE: [Boston.pm] damian talk

2003-01-13 Thread Tolkin, Steve
I vote for Life, the Universe, and everything.
___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm



RE: [Boston.pm] damian talk

2003-01-13 Thread Tolkin, Steve
If I recall correctly, Olive Oyl, in some old Popeye cartoon,
says it to the Brutus character in the definitive American way:
Et tu, you brute
 
Hopefully helpfully yours,
Steve
-- 
Steven Tolkin  [EMAIL PROTECTED]  617-563-0516 
Fidelity Investments   82 Devonshire St. V8D Boston MA 02109
There is nothing so practical as a good theory.  Comments are by me, 
not Fidelity Investments, its subsidiaries or affiliates.


 -Original Message-
 From: Drew Taylor [mailto:[EMAIL PROTECTED]]
 Sent: Monday, January 13, 2003 3:02 PM
 To: Walt Mankowski; [EMAIL PROTECTED]
 Subject: Re: [Boston.pm] damian talk
 
 
 At 02:30 PM 1/13/03 -0500, Walt Mankowski wrote:
 
 Geez, screwing up Latin AND Shakespeare in one short phrase.  Don't
 they teach you kids across the pond ANYTHING these days?  :)
 
 Never underestimate the power of public education in the US. :-)
 
 Drew
 
 --
 Drew Taylor| Web development  consulting
 http://www.drewtaylor.com/ | perl/mod_perl/DBI/mysql/postgres
 --
 Netflix: DVD Rentals by mail with NO late fees or due dates!
 Free Trial - http://www.netflix.com/Default?mqso=36126240
 --
 
 ___
 Boston-pm mailing list
 [EMAIL PROTECTED]
 http://mail.pm.org/mailman/listinfo/boston-pm
 
___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm



[Boston.pm] wanted: perl code to do JAXB name mapping (LONG)

2002-12-04 Thread Tolkin, Steve
Summary: I am looking for a program to do name mappping
as specified in Appendix C of the JAXB (Java XML Binding) spec.
This for example will map from foo_bar to fooBar etc. 
Although they talk about Java and XML names, this 
mapping applies to many other programming languages too.
In particular databases typically use the underscore 
character as the separator, and so this program would
would be very useful for that translation.

Note the careful treatment that locates the word break in front 
of an upper case letter followed by a lowercase letter 
e.g. FOOBar becomes FOO_BAR in the mapping to a constant.


Details:
$Id: jaxb_name_mapping.txt 1.3 2002/12/04 14:51:06 A071046 Exp $

[I quote from the following document, downloadable from Sun.  I only
quoted the first part of Appendix C - mapping XML name to Java
Identidier.  I also want a program to do the reverse mapping.  It was
in file jaxb-0_7-prd-spec.pdf.  After copying the text and pasting it
as plain ASCII I had to slightly edit this file, e.g. to align the
tables using spaces, add newlines, etc.  I lost many of the bullets in
the original and did not manually add them all back.]

quote from = 
The Java(TM) Architecture for XML Binding (JAXB) Public Draft, V0.7
September 12, 2002 


C.1 Overview

This section provides default mappings from:

XML Name to Java identifier

Model group to Java identifier

Namepsace URI to Java package name

C.2 The Name to Identifier Mapping Algorithm

Java identifiers typically follow three simple, well-known
conventions:

Class and interface names always begin with an upper-case letter. The
remaining characters are either digits, lower-case letters, or
upper-case letters. Upper-case letters within a multi-word name serve
to identify the start of each non-initial word, or sometimes to stand
for acronyms.

Method names and components of a package name always begin with a
lower-case letter, and otherwise are exactly like class and interface
names.

Constant names are entirely in upper case, with each pair of words
separated by the underscore character ('_', \u005F, LOW LINE).

XML names, however, are much richer than Java identifiers: They may
include not only the standard Java identifier characters but also
various punctuation and special characters that are not permitted in
Java identifiers. Like most Java identifiers, most XML names are in
practice composed of more than one natural-language word. Non-initial
words within an XML name typically start with an upper-case letter
followed by a lower-case letter, as in Java, or are prefixed by
punctuation characters, which is not usual in Java and, for most
punctuation characters, is in fact illegal.

In order to map an arbitrary XML name into a Java class, method, or
constant identifier, the XML name is first broken into a word
list. For the purpose of constructing word lists from XML names we use
the following definitions:

A punctuation character is one of the following:

* A hyphen ('-', \u002D, HYPHEN-MINUS),
* A period ('.', \u002E, FULL STOP),
* A colon (':', \u003A, COLON),
* An underscore ('_', \u005F, LOW LINE),
* A dot ('.', \u00B7, MIDDLE DOT),
* \u0387, GREEK ANO TELEIA,
* \u06DD, ARABIC END OF AYAH, or
* \u06DE, ARABIC START OF RUB EL HIZB.

These are all legal characters in XML names.

A letter is a character for which the Character.isLetter method
returns true, i.e., a letter according to the Unicode standard. Every
letter is a legal Java identifier character, both initial and
non-initial.

A digit is a character for which the Character.isDigit method returns
true, i.e., a digit according to the Unicode Standard. Every digit is
a legal non-initial Java identifier character.

A mark is a character that is in none of the previous categories but
for which the Character.isJavaIdentifierPart method returns true. This
category includes numeric letters, combining marks, non-spacing marks,
and ignorable control characters.

Every XML name character falls into one of the above categories. We
further divide letters into three subcategories:

An upper-case letter is a letter for which the Character.isUpperCase
method returns true,

A lower-case letter is a letter for which the Character.isLowerCase
method returns true,and

All other letters are uncased.

An XML name is split into a word list by removing any leading and
trailing punctuation characters and then searching for word breaks. A
wordbreak is defined by three regular expressions: A prefix, a
separator, and a suffix. The prefix matches part of the word that
precedes the break, the separator is not part of any word, and the
suffix matches part of the word that follows the break. The word
breaks are defined as:


Table 3-1 XML Word Breaks

Prefix   Separator Suffix  Example

[^punct] punct+[^punct]foo|--|bar
digit  [^digit]foo22|bar
[^digit]   digit   foo|22
lower  [^lower]foo|Bar
upper  upper lower FOO|Bar
letter [^letter]