[PHP] Connect to Google

2012-02-16 Thread John Taylor-Johnston
I'm a teacher. I want to use PHP to interface with Google and see if a 
student has plagiarized.


I don't see many open-source projects on the subject, so I want to 
create my own script.


How can I use PHP to interface with Google and see if this text exists 
on the internet?


If this is possible, I need some ideas on how to parse the text and 
input it into Google.


Then I might like to get a percentage idea of how this text compares to 
a site that Google has indexed.



$SampleText = Lorem ipsum dolor sit amet, test link adipiscing elit. 
Nullam dignissim convallis est. Quisque aliquam. Donec faucibus. Nunc 
iaculis suscipit dui. Nam sit amet sem. Aliquam libero nisi, imperdiet 
at, tincidunt nec, gravida vehicula, nisl. Praesent mattis, massa quis 
luctus fermentum, turpis mi volutpat justo, eu volutpat enim diam eget 
metus. Maecenas ornare tortor. Donec sed tellus eget sapien fringilla 
nonummy. Mauris a ante. Suspendisse quam sem, consequat at, commodo 
vitae, feugiat in, nunc. Morbi imperdiet augue quis tellus.


John


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Connect to Google

2012-02-16 Thread Marc Guay
 I'm a teacher. I want to use PHP to interface with Google and see if a
 student has plagiarized.

Hi.  Why not just enter the suspected text into a search engine and
see if any close matches come up?  If you use the advanced search
tools you can choose verbatim and see if the exact phrase matches.
If that's not good enough, can you explain how you would like it to
function?  Would the whole paper be scanned phrase-by-phrase for
matches and then spit out a report?

Marc

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Connect to Google

2012-02-16 Thread Ashley Sheridan
On Wed, 2012-02-15 at 21:56 -0500, John Taylor-Johnston wrote:

 I'm a teacher. I want to use PHP to interface with Google and see if a 
 student has plagiarized.
 
 I don't see many open-source projects on the subject, so I want to 
 create my own script.
 
 How can I use PHP to interface with Google and see if this text exists 
 on the internet?
 
 If this is possible, I need some ideas on how to parse the text and 
 input it into Google.
 
 Then I might like to get a percentage idea of how this text compares to 
 a site that Google has indexed.
 
 
 $SampleText = Lorem ipsum dolor sit amet, test link adipiscing elit. 
 Nullam dignissim convallis est. Quisque aliquam. Donec faucibus. Nunc 
 iaculis suscipit dui. Nam sit amet sem. Aliquam libero nisi, imperdiet 
 at, tincidunt nec, gravida vehicula, nisl. Praesent mattis, massa quis 
 luctus fermentum, turpis mi volutpat justo, eu volutpat enim diam eget 
 metus. Maecenas ornare tortor. Donec sed tellus eget sapien fringilla 
 nonummy. Mauris a ante. Suspendisse quam sem, consequat at, commodo 
 vitae, feugiat in, nunc. Morbi imperdiet augue quis tellus.
 
 John
 
 


Wow, that's a pretty big project you're chewing there. A quick search
shows that there are some project out there to detect plagiarism, but I
think for university calibre there's a hefty sum of money required.

To get a rough idea, you could break a text into sentences, and then
query each one of those to see if it occurs just like that. You can use
cURL to grab search results pages for this sort of thing, no need for a
special interface. There are a few things to bear in mind though:


  * Googles terms and conditions may prohibit using their search
engine like this, or may impose a limit on how much you can do
this
  * Some sentences will be intentionally copied, as quotes. Maybe
some sort of check against the source to see if it's in a quote
context.
  * What if only part of a sentence is copied?


Maybe after you've searched for exact matches from the sentences in the
source, you could remove them from the source, then re-check every
sentence against Googles fuzzy search. It may produce many false
positives though.

There are plenty of other factors too, such as students copying from
books which don't exist in a search engines archives, some subjects may
unintentionally result in the same way of wording, particularly
technical subjects which tend to be removed from more creative and
flowery descriptive tendencies.

-- 
Thanks,
Ash
http://www.ashleysheridan.co.uk




Re: [PHP] Connect to Google

2012-02-16 Thread Marc Guay
 If you use the advanced search
 tools you can choose verbatim and see if the exact phrase matches.

Just correcting myself here, the way to do this is by simply wrapping
the words in quotes like this, hey now.  The verbatim tool is
something else.

Marc

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Connect to Google

2012-02-16 Thread John Taylor-Johnston

Can I use PHP to interface with Google? Any possible examples of this?

Let's start with the first step. :)

I'm sure proprietary sites like http://www.compilatio.net/ for example connects 
to search engines. They cannot be crawling the net too. That would be crazy.

(I'm a top quoter. It's more intuitive.)

Thanks Ash.

John



Ashley Sheridan wrote:

On Wed, 2012-02-15 at 21:56 -0500, John Taylor-Johnston wrote:

How can I use PHP to interface with Google and see if this text exists
on the internet?



Wow, that's a pretty big project you're chewing there. A quick search 
shows that there are some project out there to detect plagiarism, but 
I think for university calibre there's a hefty sum of money required.




--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Connect to Google

2012-02-16 Thread John Taylor-Johnston

I'm a top quoter.
I would parse the text first. Phrase by phrase, or phrase segments.
Then spit out a report.

Marc Guay wrote:

If that's not good enough, can you explain how you would like it to
function?  Would the whole paper be scanned phrase-by-phrase for
matches and then spit out a report?


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Connect to Google

2012-02-16 Thread Ashley Sheridan
On Thu, 2012-02-16 at 14:47 -0500, John Taylor-Johnston wrote:

 Can I use PHP to interface with Google? Any possible examples of this?
 
 Let's start with the first step. :)
 
 I'm sure proprietary sites like http://www.compilatio.net/ for example 
 connects to search engines. They cannot be crawling the net too. That would 
 be crazy.
 
 (I'm a top quoter. It's more intuitive.)
 
 Thanks Ash.
 
 John
 
 
 
 Ashley Sheridan wrote:
  On Wed, 2012-02-15 at 21:56 -0500, John Taylor-Johnston wrote:
  How can I use PHP to interface with Google and see if this text exists
  on the internet?
 
 
  Wow, that's a pretty big project you're chewing there. A quick search 
  shows that there are some project out there to detect plagiarism, but 
  I think for university calibre there's a hefty sum of money required.
 


It might seem more intuitive to you, but it really, really screws up the
archives.

Like I said before, cURL is the way to interface with Google.
Basically, cURL can be used to request resources, in this case a web
page, from the web. You can call a URL and parse the page of results to
determine whatever you need to. As you've not really hashed out any firm
ides of what exactly you want, it's a little difficult to say exactly
what you need to do.

-- 
Thanks,
Ash
http://www.ashleysheridan.co.uk




Re: [PHP] Connect to Google

2012-02-16 Thread Ashley Sheridan
On Thu, 2012-02-16 at 14:50 -0500, John Taylor-Johnston wrote:

 I'm a top quoter.
 I would parse the text first. Phrase by phrase, or phrase segments.
 Then spit out a report.
 
 Marc Guay wrote:
  If that's not good enough, can you explain how you would like it to
  function?  Would the whole paper be scanned phrase-by-phrase for
  matches and then spit out a report?
 


You might be a top quoter but, please, to get the best from this list
and not annoy people post at the bottom. The list gets archived online
at many places, and it's annoying to read things in this order:

reply 4
reply 2
question
reply 1
reply 3

Almost every email client I know of allows bottom posting. This is just
one of the rules of this list, please don't be offended, but do try to
keep to the rules, it keeps everyone happy, and happy people are helpful
people!

-- 
Thanks,
Ash
http://www.ashleysheridan.co.uk




Re: [PHP] Connect to Google

2012-02-16 Thread Matijn Woudt
2012/2/16 John Taylor-Johnston jt.johns...@usherbrooke.ca:
 Can I use PHP to interface with Google? Any possible examples of this?

There's Google Custom Search API:
http://code.google.com/intl/nl-NL/apis/customsearch/v1/overview.html

It interfaces in JSON, and PHP has json functions included since PHP 5.2. [1].
It's free up to 100 queries a day, after that you have to pay $5 per
1000 queries.

- Matijn

[1] www.php.net/json

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Connect to Google

2012-02-16 Thread Marc Guay
This is the first time I've been surprised that a Drupal module
existed for something...

http://drupal.org/project/authenticate

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Connect to Google

2012-02-16 Thread Marc Guay
Sort of off topic but here's a list of existing services (some of
which are free) in case you don't want to reinvent the wheel.

http://www.justfitstudio.com/articles/plagiarism-detection.html

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php