php-general Digest 14 Mar 2013 15:15:22 -0000 Issue 8162

Topics (messages 320522 through 320535):

Re: Accessing Files Outside the Web Root
        320522 by: Dale H. Cook
        320528 by: David Robley
        320533 by: tamouse mailing lists
        320535 by: Dale H. Cook

Re: Mystery foreach error
        320523 by: Matijn Woudt
        320524 by: Angela Barone
        320525 by: Matijn Woudt
        320526 by: Angela Barone
        320527 by: David Harkness
        320529 by: Sebastian Krebs
        320530 by: David Harkness
        320531 by: Sebastian Krebs
        320532 by: Angela Barone

Re: Generating CRUD code for normalized db
        320534 by: tamouse mailing lists

Administrivia:

To subscribe to the digest, e-mail:
        php-general-digest-subscr...@lists.php.net

To unsubscribe from the digest, e-mail:
        php-general-digest-unsubscr...@lists.php.net

To post to the list, e-mail:
        php-gene...@lists.php.net


----------------------------------------------------------------------
--- Begin Message ---
At 05:04 PM 3/13/2013, Dan McCullough wrote
:
>Web bots can ignore the robots.txt file, most scrapers would.

and at 05:06 PM 3/13/2013, Marc Guay wrote:

>These don't sound like robots that would respect a txt file to me.

Dan and Marc are correct. Although I used the terms "spiders" and "pirates" I 
believe that the correct term, as employed by Dan, is "scrapers," and that 
twerm might be applied to either the robot or the site which displays its 
results. One blogger has called scrapers "the arterial plaque of the Internet." 
I need to implement a solution that allows humans to access my files but 
prevents scrapers from accessing them. I will undoubtedly have to implement 
some type of challenge-and-response in the system (such as a captcha), but as 
long as those files are stored below the web root a scraper that has a valid 
URL can probably grab them. That is part of what the "public" in public_html 
implies.

One of the reasons why this irks me is that the scrapers are all commercial 
sites, but they haven't offered me a piece of the action for the use of my 
files. My domain is an entirely non-commercial domain, and I provide free 
hosting for other non-commercial genealogical works, primarily pages that are 
part of the USGenWeb Project, which is perhaps the largest of all 
non-commercial genealogical projects.

Dale H. Cook, Member, NEHGS and MA Society of Mayflower Descendants;
Plymouth Co. MA Coordinator for the USGenWeb Project
Administrator of http://plymouthcolony.net 


--- End Message ---
--- Begin Message ---
"Dale H. Cook" wrote:

> At 05:04 PM 3/13/2013, Dan McCullough wrote
> :
>>Web bots can ignore the robots.txt file, most scrapers would.
> 
> and at 05:06 PM 3/13/2013, Marc Guay wrote:
> 
>>These don't sound like robots that would respect a txt file to me.
> 
> Dan and Marc are correct. Although I used the terms "spiders" and
> "pirates" I believe that the correct term, as employed by Dan, is
> "scrapers," and that twerm might be applied to either the robot or the
> site which displays its results. One blogger has called scrapers "the
> arterial plaque of the Internet." I need to implement a solution that
> allows humans to access my files but prevents scrapers from accessing
> them. I will undoubtedly have to implement some type of
> challenge-and-response in the system (such as a captcha), but as long as
> those files are stored below the web root a scraper that has a valid URL
> can probably grab them. That is part of what the "public" in public_html
> implies.
> 
> One of the reasons why this irks me is that the scrapers are all
> commercial sites, but they haven't offered me a piece of the action for
> the use of my files. My domain is an entirely non-commercial domain, and I
> provide free hosting for other non-commercial genealogical works,
> primarily pages that are part of the USGenWeb Project, which is perhaps
> the largest of all non-commercial genealogical projects.
> 

readfile() is probably where you want to start, in conjunction with a 
captcha or similar

-- 
Cheers
David Robley

Catholic (n.) A cat with a drinking problem.


--- End Message ---
--- Begin Message ---
On Mar 13, 2013 7:06 PM, "David Robley" <robl...@aapt.net.au> wrote:
>
> "Dale H. Cook" wrote:
>
> > At 05:04 PM 3/13/2013, Dan McCullough wrote
> > :
> >>Web bots can ignore the robots.txt file, most scrapers would.
> >
> > and at 05:06 PM 3/13/2013, Marc Guay wrote:
> >
> >>These don't sound like robots that would respect a txt file to me.
> >
> > Dan and Marc are correct. Although I used the terms "spiders" and
> > "pirates" I believe that the correct term, as employed by Dan, is
> > "scrapers," and that twerm might be applied to either the robot or the
> > site which displays its results. One blogger has called scrapers "the
> > arterial plaque of the Internet." I need to implement a solution that
> > allows humans to access my files but prevents scrapers from accessing
> > them. I will undoubtedly have to implement some type of
> > challenge-and-response in the system (such as a captcha), but as long as
> > those files are stored below the web root a scraper that has a valid URL
> > can probably grab them. That is part of what the "public" in public_html
> > implies.
> >
> > One of the reasons why this irks me is that the scrapers are all
> > commercial sites, but they haven't offered me a piece of the action for
> > the use of my files. My domain is an entirely non-commercial domain,
and I
> > provide free hosting for other non-commercial genealogical works,
> > primarily pages that are part of the USGenWeb Project, which is perhaps
> > the largest of all non-commercial genealogical projects.
> >
>
> readfile() is probably where you want to start, in conjunction with a
> captcha or similar
>
> --
> Cheers
> David Robley
>
> Catholic (n.) A cat with a drinking problem.
>
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>

If the files are delivered via the web, by php or some other means, even if
located outside webroot, they'd still be scrapeable.

--- End Message ---
--- Begin Message ---
At 04:06 AM 3/14/2013, tamouse mailing lists wrote:

>If the files are delivered via the web, by php or some other means, even if
>located outside webroot, they'd still be scrapeable.

Bots, however, being "mechanical" (i.e., hard wired or programmed) behave in 
different ways than humans, and that difference can be exploited in a script.

Part of the rationale in putting the files outside the root is that they have 
no URLs, eliminating one vulnerability (you can't scrape the URL of a file if 
it has no URL). Late last night I figured out why I was having trouble 
accessing those external files from my script, and now I'm working out the 
parsing details that enable one script to access multiple external files. My 
approach probably won't defeat all bad bots, but it will likely defeat most of 
them. You can't make code bulletproof, but you can wrap it in Kevlar.

Dale H. Cook, Member, NEHGS and MA Society of Mayflower Descendants;
Plymouth Co. MA Coordinator for the USGenWeb Project
Administrator of http://plymouthcolony.net 


--- End Message ---
--- Begin Message ---
On Wed, Mar 13, 2013 at 5:07 PM, Jim Giner <jim.gi...@albanyhandball.com>wrote:

> On 3/12/2013 9:04 PM, Angela Barone wrote:
>
>> On Mar 12, 2013, at 5:16 PM, David Robley wrote:
>>
>>> Presumably there is a fixed list of State - those are US states? -
>>>
>>
>>  so why not provide a drop down list of the possible choices?
>>>
>>
>>         There is, but the problem must have been that if someone didn't
>> select a State, $state was blank.  I've since given the "Select a State..."
>> choice a value of 'XX' and I'm now looking for that in the if statement I
>> mentioned before.
>>
>> Angela
>>
>>  Why not just check if the $state exists as a key of the array $states
> before doing this?


Exactly, that's much better. It could be that some hacker enters something
other than XX or one of the states..

--- End Message ---
--- Begin Message ---
On Mar 13, 2013, at 9:07 AM, Jim Giner wrote:
> Why not just check if the $state exists as a key of the array $states before 
> doing this?

Jim,

        Are you thinking about the in_array function?

Angela

--- End Message ---
--- Begin Message ---
On Thu, Mar 14, 2013 at 12:18 AM, Angela Barone <ang...@italian-getaways.com
> wrote:

> On Mar 13, 2013, at 9:07 AM, Jim Giner wrote:
> > Why not just check if the $state exists as a key of the array $states
> before doing this?
>
> Jim,
>
>         Are you thinking about the in_array function?
>
> Angela


That wouldn't work, in_array checks the values, and your states are in the
keys. Use:
if(isset($states[$state]))

- Matijn

--- End Message ---
--- Begin Message ---
On Mar 13, 2013, at 4:24 PM, Matijn Woudt wrote:
> That wouldn't work, in_array checks the values, and your states are in the 
> keys. Use:
> if(isset($states[$state])) 

Hi Matijn,

        Before I received your email, I ran across if(array_key_exists) and it 
seems to work.  How does that differ from if(isset($states[$state]))?

Angela

--- End Message ---
--- Begin Message ---
On Wed, Mar 13, 2013 at 4:44 PM, Angela Barone
<ang...@italian-getaways.com>wrote:

> I ran across if(array_key_exists) and it seems to work.  How does that
> differ from if(isset($states[$state]))?


Hi Angela,

isset() will return false for an array key 'foo' mapped to a null value
whereas array_key_exists() will return true. The latter asks "Is this key
in the array?" whereas isset() adds "and is its value not null?" While
isset() is every-so-slightly faster, this should not be a concern. Use
whichever makes sense for the context here. Since you don't stick null
values into the array, I prefer the isset() form because the syntax reads
better to me.

Peace,
David

--- End Message ---
--- Begin Message ---
2013/3/14 David Harkness <davi...@highgearmedia.com>

> On Wed, Mar 13, 2013 at 4:44 PM, Angela Barone
> <ang...@italian-getaways.com>wrote:
>
> > I ran across if(array_key_exists) and it seems to work.  How does that
> > differ from if(isset($states[$state]))?
>
>
> Hi Angela,
>
> isset() will return false for an array key 'foo' mapped to a null value
> whereas array_key_exists() will return true. The latter asks "Is this key
> in the array?" whereas isset() adds "and is its value not null?" While
> isset() is every-so-slightly faster, this should not be a concern. Use
> whichever makes sense for the context here. Since you don't stick null
> values into the array, I prefer the isset() form because the syntax reads
> better to me.
>

Just a minor addition: Because 'null' is the representation of "nothing"
array_key_exists() and isset() can be treated as semantically equivalent.

Another approach (in my eyes the cleaner one ;)) is to simply _ensure_ that
the keys I want to use exists. Of course this only works in cases, where
the key is not dynamical, or the dynamic keys are known, which is not the
case here, it seems.

$defaults = array('stateNames' => array(), 'states' => array());
$values = array_merge($defaults, $values);
$values['states'] = array_merge(array_fill_keys($values['stateNames'],
null), $values['states']);
if (!$values[$myState]) {

}


>
> Peace,
> David
>



-- 
github.com/KingCrunch

--- End Message ---
--- Begin Message ---
On Wed, Mar 13, 2013 at 5:10 PM, Sebastian Krebs <krebs....@gmail.com>wrote:

> Because 'null' is the representation of "nothing" array_key_exists() and
> isset() can be treated as semantically equivalent.


As I said, these functions return different results for null values. It
won't matter for Angela since she isn't storing null in the array, though.

Peace,
David

--- End Message ---
--- Begin Message ---
2013/3/14 David Harkness <davi...@highgearmedia.com>

>
> On Wed, Mar 13, 2013 at 5:10 PM, Sebastian Krebs <krebs....@gmail.com>wrote:
>
>> Because 'null' is the representation of "nothing" array_key_exists() and
>> isset() can be treated as semantically equivalent.
>
>
> As I said, these functions return different results for null values. It
> won't matter for Angela since she isn't storing null in the array, though.
>

Thats exactly, what I tried to say :)


>
> Peace,
> David
>
>


-- 
github.com/KingCrunch

--- End Message ---
--- Begin Message ---
On Mar 13, 2013, at 5:02 PM, David Harkness wrote:
> isset() will return false for an array key 'foo' mapped to a null value 
> whereas array_key_exists() will return true. The latter asks "Is this key in 
> the array?" whereas isset() adds "and is its value not null?" While isset() 
> is every-so-slightly faster, this should not be a concern. Use whichever 
> makes sense for the context here.

Hi David,

        Thank you for the explanation.  It's nice to know the difference 
between them.  Since they are equivalent for my use, I went with 
array_key_exists, simply because it makes more sense to me in English. ;)

        Thanks again to everyone.  I got it to work _and_ there are no more 
errors!!!

Angela

--- End Message ---
--- Begin Message ---
On Mar 13, 2013 1:52 PM, "Ashley Sheridan" <a...@ashleysheridan.co.uk> wrote:
>
> On Wed, 2013-03-13 at 19:24 +0100, Marco Behnke wrote:
>
> > Am 13.03.13 12:57, schrieb Gary:
> > > ma...@behnke.biz wrote:
> > >
> > >> Do us all a favor abnd stay away from open source if you do not honor
> > >> the work
> > >> us wannabes put into it.
> > > As I said before "I wasn't aware you would feel that the cap fitted."
> > > If you do feel that, then perhaps instead of complaining at me for
> > > pointing it out, you would be better off employing that time
increasing
> > > the quality of what you produce.
> > >
> > So you said you tried Yii. But have you wasted some of your precious
> > time trying out the extension that "extends" Yii in a way, that creating
> > models and views with Gii get proper SELECT Boxes and stuff for
> > relations? If I understood you correct, this is what you were looking
for?
> >
>
>
> At this point I don't think he's looking for an actual solution, but
> merely wants to moan about open source. OSS has flaws, of course, but
> even someone so narrow minded would have a hard time arguing in earnest
> that it suffered from too little choice and a lack of solutions to a
> problem.
>
> Thanks,
> Ash
> http://www.ashleysheridan.co.uk
>
>

And isn't the point of OSS exactly that if it doesn't work for your needs,
you enhance it? Whingeing about it seems a sure-fire way to not get any
help.

--- End Message ---

Reply via email to