Re: [CODE4LIB] Job Posting: Digital Library Repository Developer, Boston Public Library (Boston, MA)

2011-09-29 Thread don warner saklad
Our Boston Public Library needs to develop a more open encouraging
environment that we have folks with the best talents and skills! Begin
with fomenting exchanges where current BPLers can be free to express
their points of views. At this time BPLers are squelched by an
oppressive environment of relatively closed minded collegial managers.
Our Boston Public Library needs to be a Learning Library where
professionals, where all the folks at the counters and desks can
advance their expertise on site for credit with on site academic
programs administered by University of Massachusetts at Boston or
Northeastern Univeristy. We have Medical Centers that are Learning
Hospitals, we need Learning Libraries for information studies careers
advancement. Also on site for introducing people to library and
information careers. Most BPLers are squelched, suppressed expressing
their concerns for fear of reprisals by the fickle pissants!
University of Rhode Island tried once but got blocked by overpriced
Simmons a college that is short on extensive financial resources for
study. University at Albany would be a great fit for a demonstration
Learning Library project at our Boston Public Library should we get
political powers to support that, see also
http://www.albany.edu/informationstudies/


Re: [CODE4LIB] ny times best seller api

2011-09-29 Thread Tom Keays
On Wed, Sep 28, 2011 at 5:38 PM, Doug Yancey  wrote:
>I just got this set up on our site, using php to parse the json response.
> You should be able to see the result on our staging port at:
>
>http://catalog.ccrls.org:2082
>
>...under the 'Bestsellers' menu in the top nav bar.

Nice. It reminds me of the Reading Radar portal, written using the
Maintainable PHP Framework.

http://readingradar.com/
https://github.com/maintainable/framework

> would be happy to share that code with you, if you think it would
>help.

The author of Reading Radar described the structure of his mashup, but
didn't follow up with any code. It would be great to see yours.

http://jhherren.wordpress.com/2009/02/03/mashing-up-the-new-york-times-best-sellers-readingradarcom/


On Wed, Sep 28, 2011 at 5:36 PM, Godmar Back  wrote:
>NYTimes could allow JSONP, but then developers would need to embed their
API
>key in their web pages, which means the API key would simply be a token
used
>for statistics, rather than for authentication. It's their choice that they
>don't allow that.

I saw several threads in the NY Times developer forum, one as far back as
2009, where a JSONP option had been promised by NY Times staff. It may not
be coming.

However, given that the API is read-only, I don't see the real consequence
of using keys in the open.


Tom


[CODE4LIB] VuStuff Conference, Villanova University, October 12th, 2011

2011-09-29 Thread David Uspal
Apologies for cross-posting:

Greetings from the VuStuff Team,

VuStuff is an annual conference on the intersection between scholarship and 
technology, held at Villanova University's Falvey Memorial Library.  This 
year's conference, VuStuff II, will be held on Wednesday, Oct. 12, 2011 at 
Falvey Library, Villanova University.  As with last year, the event is free to 
attend and lunch will be provided.  You can see the schedule at the following 
link:

http://vustuff.org/vustuff/schedule

If you haven't already signed up for this year's event, I encourage you to do 
so ASAP as space is once again limited.  To register for this year's event, 
follow this link:

http://vustuff.org/vustuff/registration/

Thank you for your time and we hope to see you October 12th.


David K. Uspal
Technology Development Specialist
Falvey Memorial Library
Phone: 610-519-8954
Email: david.us...@villanova.edu


Re: [CODE4LIB] mysql subquery response time

2011-09-29 Thread Genny Engel
Well, not being a MySQL whiz myself, I thought maybe the query was being 
needlessly slowed down by having DISTINCT specified in both parts of the query. 
 In testing a similar query on one of my databases, the double DISTINCT query 
did run quite slowly. I tried modifying it to a form like this: 

SELECT distinct institution from renewals
WHERE institution not in
(SELECT institution FROM `renewals` WHERE snap_date < '2011-07-01')

It ran much faster.  But, so did rerunning the original query. 

There is a page in the MySQL documentation about some limitations and some 
optimizer issues with subqueries:
http://dev.mysql.com/doc/refman/5.1/en/subquery-restrictions.html

Genny Engel
Sonoma County Library
gen...@sonoma.lib.ca.us
707 545-0831 x581
www.sonomalibrary.org


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ken 
Irwin
Sent: Wednesday, September 28, 2011 9:41 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] mysql subquery response time

Hi all,

I've not done much with MySQL subqueries, and I'm trying right now with what I 
find to be surprising results. I wonder if someone can help me understand.

I have a pile of data that with columns for "institution" and "date". 
Institution gets repeated a lot, with many different dates. I want to select 
all the institutions that *only* have dates after July 1 and don't appear in 
the table before that. My solution was to do a first query for all the 
institutions that DO have dates before July 1
SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01'

And then to do a SELECT query on all the institutions:
SELECT distinct institution from renewals

And then try to do a NOT IN subquery subtracting the smaller query from the 
larger one:

SELECT distinct institution from renewals
WHERE institution not in
(SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01')

...only it doesn't seem to work. Or rather, the query has been running for 
several minutes and never comes back with an answer. Each of these two queries 
takes just a few milliseconds to run on its own.

Can someone tell me (a) am I just formatting the query wrong, (b) do subqueries 
like this just take forever, and/or (c) is there a better way to do this? (I 
don't really understand about JOIN queries, but from what I can tell they are 
only for mixing the results of two different tables so I think they might not 
apply here.)

Any advice would be most welcome.

Thanks
Ken


Re: [CODE4LIB] mysql subquery response time

2011-09-29 Thread David Maus
At Wed, 28 Sep 2011 12:41:26 -0400,
Ken Irwin wrote:
>
> Hi all,
>
> I've not done much with MySQL subqueries, and I'm trying right now with what 
> I find to be surprising results. I wonder if someone can help me understand.
>
> I have a pile of data that with columns for "institution" and "date". 
> Institution gets repeated a lot, with many different dates. I want to select 
> all the institutions that *only* have dates after July 1 and don't appear in 
> the table before that. My solution was to do a first query for all the 
> institutions that DO have dates before July 1
> SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01'
>
> And then to do a SELECT query on all the institutions:
> SELECT distinct institution from renewals
>
> And then try to do a NOT IN subquery subtracting the smaller query from the 
> larger one:
>
> SELECT distinct institution from renewals
> WHERE institution not in
> (SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01')
>
> ...only it doesn't seem to work. Or rather, the query has been running for 
> several minutes and never comes back with an answer. Each of these two 
> queries takes just a few milliseconds to run on its own.
>
> Can someone tell me (a) am I just formatting the query wrong, (b) do 
> subqueries like this just take forever, and/or (c) is there a better way to 
> do this? (I don't really understand about JOIN queries, but from what I can 
> tell they are only for mixing the results of two different tables so I think 
> they might not apply here.)
>
> Any advice would be most welcome.


Not sure about the length of the combined query. One reason might be
that if m denotes the total number of rows in `renewals' and n denotes
the number of rows in the subquery, then AFAIK the MySQL needs to
perform m x n comparisions.

If I am not mistaken there might be room for simplifying the entire
query.

For the sake of simplicity first drop the DISTINCT constraint and
consider `snap_date' to not contain NULL values. Now we define:

- M is the set of all rows in `renewals'
- A is the subset of all rows in `renewals' created by the subquery's
  WHERE claus

We know some things about A and M:

A is a subset of M.

The WHERE clause divides M into two subsets: One set of rows that
fullfill the WHERE clause (set A) and one set of rows that fullfill
the complement of set A's WHERE clause (set B).

The intersection of A and B is the empty set: Either a `snap_date' is
before 2011-07-01 or it is equal to or after 2011-07-01. This
translates to: Each row in `renewals' is either in set A or in set
B. A row that is not in set A is in set B and vice versa.

If this holds true, you can simplify the entire query from:

SELECT institution from renewals WHERE institution *not in A*

to:

SELECT institution from renewals WHERE institution *in B*

Second step. Which `institution' fields of the entire table (outer
query) are in set B (inner query)? Well: The `institution' fields of
all rows in set B.

Or to put it in another way: If you take each `institution' from the
entire table and check if it is in the subset of rows created by the
inner query you will find that every `institution' whose `snap_date'
fullfills the inner query's WHERE claus is in the set created by the
inner query.

I.e.: You can drop the outer query:

SELECT institution FROM renewals WHERE `snap_date' >= '2011-07-01'

Finally put back the DISTINCT constraint.

SELECT DISTINCT institution FROM renewals WHERE `snap_date' >= '2011-07-01'

Does this make sense?

Btw. Stanford started an online course on "Introduction to Databases"
at http://www.db-class.org -- you can access the online tutorials
without beeing rolled-in at http://www.db-class.org/course/video/preview_list

HTH,
  -- David
--
David Maus
Projekt HAB 2.0
Herzog August Bibliothek - D-38299 Wolfenbuettel
Phone: +49-5331-808-379
Email: m...@hab.de
Github: http://github.com/dmj

pgpeyfd1190Pz.pgp
Description: PGP signature


Re: [CODE4LIB] mysql subquery response time

2011-09-29 Thread David Uspal
Ken,

   The list is locked at the moment, so this reply may take awhile to reach 
you, so sorry if this question has been answered already (or I'm too late).

   Anyway, here's how I'd format it (assuming I read the question right, that 
you want distinct institutions that have listing from July 1st 2011 on and have 
never been listed before then):

   SELECT distinct institution FROM renewals WHERE date >= '2011-07-01' AND 
institution NOT IN (SELECT distinct institution FROM renewals WHERE date < 
'2011-07-01');


David K. Uspal
Technology Development Specialist
Falvey Memorial Library
Phone: 610-519-8954
Email: david.us...@villanova.edu





-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ken 
Irwin
Sent: Wednesday, September 28, 2011 12:41 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] mysql subquery response time

Hi all,

I've not done much with MySQL subqueries, and I'm trying right now with what I 
find to be surprising results. I wonder if someone can help me understand.

I have a pile of data that with columns for "institution" and "date". 
Institution gets repeated a lot, with many different dates. I want to select 
all the institutions that *only* have dates after July 1 and don't appear in 
the table before that. My solution was to do a first query for all the 
institutions that DO have dates before July 1
SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01'

And then to do a SELECT query on all the institutions:
SELECT distinct institution from renewals

And then try to do a NOT IN subquery subtracting the smaller query from the 
larger one:

SELECT distinct institution from renewals
WHERE institution not in
(SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01')

...only it doesn't seem to work. Or rather, the query has been running for 
several minutes and never comes back with an answer. Each of these two queries 
takes just a few milliseconds to run on its own.

Can someone tell me (a) am I just formatting the query wrong, (b) do subqueries 
like this just take forever, and/or (c) is there a better way to do this? (I 
don't really understand about JOIN queries, but from what I can tell they are 
only for mixing the results of two different tables so I think they might not 
apply here.)

Any advice would be most welcome.

Thanks
Ken


Re: [CODE4LIB] mysql subquery response time

2011-09-29 Thread Charles Haines
Why not use and INNER or LEFT join instead of a subquery?  Typically that
will be faster. Also, if the data set from either table is large I would
recommend indexes on the tables.

--
Charles Haines
Senior Code Monkey
P: (410) 535-5590 x1196

Recorded Books, LLC
http://www.recordedbooks.com

"Theory is when you know something, but it doesn't work. Practice is when
something works, but you don't know why. Programmers combine theory and
practice: Nothing works and they don't know why."



On Wed, Sep 28, 2011 at 12:41 PM, Ken Irwin  wrote:

> Hi all,
>
> I've not done much with MySQL subqueries, and I'm trying right now with
> what I find to be surprising results. I wonder if someone can help me
> understand.
>
> I have a pile of data that with columns for "institution" and "date".
> Institution gets repeated a lot, with many different dates. I want to select
> all the institutions that *only* have dates after July 1 and don't appear in
> the table before that. My solution was to do a first query for all the
> institutions that DO have dates before July 1
> SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01'
>
> And then to do a SELECT query on all the institutions:
> SELECT distinct institution from renewals
>
> And then try to do a NOT IN subquery subtracting the smaller query from the
> larger one:
>
> SELECT distinct institution from renewals
> WHERE institution not in
> (SELECT distinct institution FROM `renewals` WHERE snap_date <
> '2011-07-01')
>
> ...only it doesn't seem to work. Or rather, the query has been running for
> several minutes and never comes back with an answer. Each of these two
> queries takes just a few milliseconds to run on its own.
>
> Can someone tell me (a) am I just formatting the query wrong, (b) do
> subqueries like this just take forever, and/or (c) is there a better way to
> do this? (I don't really understand about JOIN queries, but from what I can
> tell they are only for mixing the results of two different tables so I think
> they might not apply here.)
>
> Any advice would be most welcome.
>
> Thanks
> Ken
>


Re: [CODE4LIB] mysql subquery response time

2011-09-29 Thread Chad Mills
Ken,

It shouldn't take long to run.  If you want to so the sub-select route maybe 
this will help?

SELECT DISTINCT(institution) FROM `renewals` WHERE snap_date > '2011-07-01' AND 
institution NOT IN (SELECT DISTINCT(institution) from `renewals` where 
snap_date < '2011-07-01');

red vines > twizzlers

--
Chad Mills
Digital Library Architect
Ph: 732.932.8573 x123
Fax: 732.932.1386
Cell: 732.309.8538

Rutgers University Libraries
Scholarly Communication Center
Room 409D, Alexander Library
169 College Avenue, New Brunswick, NJ 08901

http://rucore.libraries.rutgers.edu/

- Original Message -
From: "Ken Irwin" 
To: CODE4LIB@LISTSERV.ND.EDU
Sent: Wednesday, September 28, 2011 12:41:26 PM
Subject: [CODE4LIB] mysql subquery response time

Hi all,

I've not done much with MySQL subqueries, and I'm trying right now with what I 
find to be surprising results. I wonder if someone can help me understand.

I have a pile of data that with columns for "institution" and "date". 
Institution gets repeated a lot, with many different dates. I want to select 
all the institutions that *only* have dates after July 1 and don't appear in 
the table before that. My solution was to do a first query for all the 
institutions that DO have dates before July 1
SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01'

And then to do a SELECT query on all the institutions:
SELECT distinct institution from renewals

And then try to do a NOT IN subquery subtracting the smaller query from the 
larger one:

SELECT distinct institution from renewals
WHERE institution not in
(SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01')

...only it doesn't seem to work. Or rather, the query has been running for 
several minutes and never comes back with an answer. Each of these two queries 
takes just a few milliseconds to run on its own.

Can someone tell me (a) am I just formatting the query wrong, (b) do subqueries 
like this just take forever, and/or (c) is there a better way to do this? (I 
don't really understand about JOIN queries, but from what I can tell they are 
only for mixing the results of two different tables so I think they might not 
apply here.)

Any advice would be most welcome.

Thanks
Ken


Re: [CODE4LIB] ny times best seller api

2011-09-29 Thread Nate Hill
Because a few people have asked me about this off list, and because I'm now
struggling with a different related question, I'm bringing it back to the
list for all.
If it's TMI, just delete the email and move on... sorry to jam your inbox.
But I'm determined to make this thing work.
I have the proxy working fine as Godmar suggested, but I'm doing something
wrong parsing the JSON.
Thanks- I learn an awful lot by interacting with this list.

*Here's the PHP for the proxy:*

http://api.nytimes.com/svc/books/v2/lists/hardcover-fiction.json?&api-key=
'
);
header("Content-Type: text/javascript");
echo $cb . '(' . $json . ')';
?>

*Here's the jQuery:*

jQuery(document).ready(function(){
$(function(){
//json request to new york times
$.getJSON('
http://www.sjpl.org/otherscripts/callback.php?callback=?',
function(data) {
//loop through the results with the following
function
$.each(data.results[0].book_details,
function(i,item){
//turn the title into a variable
var bookTitle = item.title;
$.each(data.results[0].isbns, function(i,item){
//turn the isbn into a variable
var bookIsbn = item.isbn13;
$('#container').append('http://catalog.sjlibrary.org/search~/a?searchtype=i&searcharg='+bookIsbn+'&SORT=D&searchscope=1">'+bookTitle+'');

});
});
});
});
});

*Here's my problem:*
I can only return the book title and the isbn by specifying which one in the
array I want, so I'm using: data.results[0].book_details to pull the first
title.
Something is wrong with the way I'm using $.each()
If I remove [0] and have data.results.book_details I get no result.
If I add [1] and have data.results[1].book_details I get the same result
twice.
If I add [2] and have data.results[2].book_details I get the same result
three times.
I'm sure I'm doing something dumb with my syntax.  Any ideas?

*Here's a bigger chunk of the JSON, since I'm having problems moving through
it.*
{
"status": "OK",
"copyright": "Copyright (c) 2011 The New York Times Company.  All Rights
Reserved.",
"num_results": 35,
"last_modified": "2011-09-23T12:00:29-04:00",
"results": [{
"list_name": "Hardcover Fiction",
"display_name": "Hardcover Fiction",
"updated": "WEEKLY",
"bestsellers_date": "2011-09-17",
"published_date": "2011-10-02",
"rank": 1,
"rank_last_week": 0,
"weeks_on_list": 1,
"asterisk": 0,
"dagger": 0,
"isbns": [{
"isbn10": "0399157786",
"isbn13": "9780399157783"
}],
"book_details": [{
"title": "NEW YORK TO DALLAS",
"description": "An escaped child molester pursues Lt. Eve
Dallas; by Nora Roberts, writing pseudonymously.",
"contributor": "by J. D. Robb",
"author": "J D Robb",
"contributor_note": "",
"price": 27.95,
"age_group": "",
"publisher": "Putnam",
"primary_isbn13": "9780399157783",
"primary_isbn10": "0399157786"
}],
"reviews": [{
"book_review_link": "",
"first_chapter_link": "",
"sunday_review_link": "",
"article_chapter_link": ""
}]
}, {
"list_name": "Hardcover Fiction",
"display_name": "Hardcover Fiction",
"updated": "WEEKLY",
"bestsellers_date": "2011-09-17",
"published_date": "2011-10-02",
"rank": 2,
"rank_last_week": 0,
"weeks_on_list": 1,
"asterisk": 0,
"dagger": 0,
"isbns": [{
"isbn10": "0385534639",
"isbn13": "9780385534635"
}, {
"isbn10": "0385534647",
"isbn13": "9780385534642"
}],
"book_details": [{
"title": "THE NIGHT CIRCUS",
"description": "Two young rivals at a magical circus become
collaborators as they fall in love.",
"contributor": "by Erin Morgenstern",
"author": "Erin Morgenstern",
"contributor_note": "",
"price": 26.95,
"age_group": "",
"publisher": "Doubleday",
"primary_isbn13": "9780385534635",
"primary_isbn10": "0385534639"
}],
"reviews": [{
"book_review_link": "",
"first_chapter_link": "",
"sunday_review_link": "",
"article_chapter_link": ""
}]
}, {
"list_name": "Hardcover Fiction",
"display_name": "Hardcover Fiction",
"updated": "WEEKLY",
"bestsellers_date": "2011-09-17",
"published_date": "2011-10-02",
"rank": 3,
"rank_last_week": 1,
"weeks_on_list": 3,
"asteris

Re: [CODE4LIB] mysql subquery response time

2011-09-29 Thread Colford, Scot
I may be wrong about this, but when you use a "not in" operator, you're
necessarily doing a table scan. Any indexes you have are not used in such
a query, so you'll definitely see a performance hit.

But if each independent query runs quickly, why not use a temporary table
to store the results of what you've got written as a subquery first, and
then use that presumably small, distinct list of institutions in the
subquery afterward? Something like this:

SELECT INTO #my_institutions
SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01'
EXEC

SELECT distinct institution from renewals
WHERE institution not in
(SELECT institution from #my_institutions)


That assumes you're able to SELECT INTO, but if not, just create yourself
a my_institutions table first.


\-/-\-/-\-/-\-/-\-/-\-/-\-/-\-/-\-/-\-/-\-/-\-/

Scot Colford
Web Services Manager
Boston Public Library

scolf...@bpl.org
Phone 617.859.2399
Mobile 617.592.8669
Fax 617.536.7558







On 9/28/11 12:41 PM, "Ken Irwin"  wrote:

>Hi all,
>
>I've not done much with MySQL subqueries, and I'm trying right now with
>what I find to be surprising results. I wonder if someone can help me
>understand.
>
>I have a pile of data that with columns for "institution" and "date".
>Institution gets repeated a lot, with many different dates. I want to
>select all the institutions that *only* have dates after July 1 and don't
>appear in the table before that. My solution was to do a first query for
>all the institutions that DO have dates before July 1
>SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01'
>
>And then to do a SELECT query on all the institutions:
>SELECT distinct institution from renewals
>
>And then try to do a NOT IN subquery subtracting the smaller query from
>the larger one:
>
>SELECT distinct institution from renewals
>WHERE institution not in
>(SELECT distinct institution FROM `renewals` WHERE snap_date <
>'2011-07-01')
>
>...only it doesn't seem to work. Or rather, the query has been running
>for several minutes and never comes back with an answer. Each of these
>two queries takes just a few milliseconds to run on its own.
>
>Can someone tell me (a) am I just formatting the query wrong, (b) do
>subqueries like this just take forever, and/or (c) is there a better way
>to do this? (I don't really understand about JOIN queries, but from what
>I can tell they are only for mixing the results of two different tables
>so I think they might not apply here.)
>
>Any advice would be most welcome.
>
>Thanks
>Ken


Re: [CODE4LIB] mysql subquery response time

2011-09-29 Thread Adam Wead
Ken,

If I understand the logic correctly, you need a list of institutions with dates 
after July 1, but only institutions that have records existing before July 1.  
Subqueries could work, but a view might be easier to work with, especially if 
you plan to a lot of queries with similar logic.

First, create a view with institutions whose dates are prior to July 1:

CREATE VIEW my_list AS SELECT DISTINCT institution FROM renewals WHERE 
snap_date < '2011-07-01'

Then, match the institutions in your renewals table with the ones in the view:

SELECT institution FROM renewals a, my_list b WHERE a.institution = 
b.institution AND snap_date > '2011-07-01'

That's the way, at least, I might tackle it... there are probably a dozen 
others however.

...adam



On Sep 28, 2011, at 12:41 PM, Ken Irwin wrote:

> Hi all,
> 
> I've not done much with MySQL subqueries, and I'm trying right now with what 
> I find to be surprising results. I wonder if someone can help me understand.
> 
> I have a pile of data that with columns for "institution" and "date". 
> Institution gets repeated a lot, with many different dates. I want to select 
> all the institutions that *only* have dates after July 1 and don't appear in 
> the table before that. My solution was to do a first query for all the 
> institutions that DO have dates before July 1
> SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01'
> 
> And then to do a SELECT query on all the institutions:
> SELECT distinct institution from renewals
> 
> And then try to do a NOT IN subquery subtracting the smaller query from the 
> larger one:
> 
> SELECT distinct institution from renewals
> WHERE institution not in
> (SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01')
> 
> ...only it doesn't seem to work. Or rather, the query has been running for 
> several minutes and never comes back with an answer. Each of these two 
> queries takes just a few milliseconds to run on its own.
> 
> Can someone tell me (a) am I just formatting the query wrong, (b) do 
> subqueries like this just take forever, and/or (c) is there a better way to 
> do this? (I don't really understand about JOIN queries, but from what I can 
> tell they are only for mixing the results of two different tables so I think 
> they might not apply here.)
> 
> Any advice would be most welcome.
> 
> Thanks
> Ken


Re: [CODE4LIB] mysql subquery response time

2011-09-29 Thread Tim Spalding
Subqueries in MySQL are murder. Terribly slow. I never, ever use them.
Use a JOIN or do two queries.

Tim

On Wed, Sep 28, 2011 at 12:41 PM, Ken Irwin  wrote:
> Hi all,
>
> I've not done much with MySQL subqueries, and I'm trying right now with what 
> I find to be surprising results. I wonder if someone can help me understand.
>
> I have a pile of data that with columns for "institution" and "date". 
> Institution gets repeated a lot, with many different dates. I want to select 
> all the institutions that *only* have dates after July 1 and don't appear in 
> the table before that. My solution was to do a first query for all the 
> institutions that DO have dates before July 1
> SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01'
>
> And then to do a SELECT query on all the institutions:
> SELECT distinct institution from renewals
>
> And then try to do a NOT IN subquery subtracting the smaller query from the 
> larger one:
>
> SELECT distinct institution from renewals
> WHERE institution not in
> (SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01')
>
> ...only it doesn't seem to work. Or rather, the query has been running for 
> several minutes and never comes back with an answer. Each of these two 
> queries takes just a few milliseconds to run on its own.
>
> Can someone tell me (a) am I just formatting the query wrong, (b) do 
> subqueries like this just take forever, and/or (c) is there a better way to 
> do this? (I don't really understand about JOIN queries, but from what I can 
> tell they are only for mixing the results of two different tables so I think 
> they might not apply here.)
>
> Any advice would be most welcome.
>
> Thanks
> Ken
>



-- 
Check out my library at http://www.librarything.com/profile/timspalding


[CODE4LIB] Senior Application Developer opening

2011-09-29 Thread Alicia D Wolfe
Columbia University's Center for Digital Research and Scholarship (CDRS) 
(http://cdrs.columbia.edu ) is seeking a 
*Senior Application Developer* to implement CDRS' online platforms by 
architecting, coding, integrating, and maintaining services and back-end 
applications that support the projects of the center. 

CDRS serves the digital research and scholarly communication needs of 
Columbia's faculty, graduate students, and other scholars through the 
innovative creation and implementation of online tools and services. The 
Senior Application Developer will work in tandem with our in-house Web 
development team to create and enhance the next generation of shared 
knowledge systems. Responsibilities include constructing applications 
and systems; designing information architecture and user interface 
schemas; manipulating, scripting, and visualizing large metadata sets 
and digital objects; and authoring technical specifications for 
applications intended for use by a variety of university departments and 
partners with diverse user requirements. 

Because of the nature of the projects CDRS accepts, there are many 
languages and software we utilize within a multi-host and heterogeneous 
application environment to fulfill the needs of any given project. Our 
active project list includes applications that require PHP, MySQL, Ruby, 
Ruby on Rails, Java, JQuery, JavaScript, Perl, YUI, XML/XSLT, 
Lucene/Solr, and in the near future we will be including Django/Python. 
Breadth of skill is preferred, but a successful applicant will have 
expert-level skill in at least one of the high-level languages described.


*Requirements:*

- Bachelor's degree in Computer Science (or equivalent combination of 
education and experience) and 3-5 years of professional experience in 
Web-based application design


- Minimum of 2 years of experience with XML technologies (particularly 
XSLT/XQuery) and one or more of the following: Ruby on Rails, Java, 
Python, Perl, or PHP/MySQL


- Comprehensive knowledge of Unix, Apache, and Tomcat environments

- Strong IA/UI skills and OO design experience

- Willingness to learn new applications and 
terminology 


*Preferred Skills:*

- 5-6 years of experience in Web-based application design

- Significant experience with relational databases, database design, and 
fluency in SQL


- Additional XML technology experience; XQuery/XPath fluency strongly 
desired


- Extensive familiarity with Python, Django, PHP, and Perl

- Familiarity with SOAP and RESTful architectures

- Familiarity with other open-source software (Solr, Lucene, etc.)

- Interest in usability and user experience 


Columbia University is open-source friendly!

*To apply:*

For immediate consideration, please submit a cover letter and résumé 
online at


https://jobs.columbia.edu/applicants/Central?quickFind=127085

One of the world's leading research universities, Columbia provides 
outstanding opportunities to work and grow in a unique intellectual 
community. Set in the Morningside Heights academic village, Columbia 
also presents the unmatched dynamism, diversity, and cultural richness 
of New York City.


The *Center for Digital Research and Scholarship *(CDRS) 
(http://cdrs.columbia.edu/) partners with researchers and scholars to 
share new knowledge. Using innovative new media and digital 
technologies, CDRS empowers Columbia's research community with the 
online tools and services necessary to make the most of scholarly 
communication, collaboration, data-sharing, and preservation. CDRS is a 
part of Columbia University Libraries/Information Services, which, 
grounded in collections of remarkable depth and breadth, are also 
building extensive electronic resources and services. The Libraries are 
committed to collegiality, professionalism, innovation, and leadership.


We offer a salary commensurate with qualifications and experience and 
excellent benefits. 

*Columbia** University** is an Equal Opportunity/Affirmative Action 
employer.*


Re: [CODE4LIB] Job Posting: Digital Library Repository Developer, Boston Public Library (Boston, MA)

2011-09-29 Thread Cary Gordon
Anyone want to apply to be Roy's nephew?

On Wed, Sep 28, 2011 at 12:28 PM, Ya'aqov Ziso  wrote:
> *S... now that we've cleared this up - anyone want to apply?*
> *===*
> anybody can apply, but Roy's nephew will get the job .
> *
> *
> *
> *
>


Re: [CODE4LIB] mysql subquery response time

2011-09-29 Thread Fowler, Jason
> I want to select all the institutions that *only* have dates after July 1

How about:

select distinct institution
from renewals
where snap_date >= '2011-07-01';

in Oracle:
where snap_date >= to_date(20110701, 'MMDD');

(remove the "=" to not include 2011-07-01)

Hopefully I understood your challenge…

Cheers,
Jason

On 11-09-28 9:41 AM, "Ken Irwin" 
mailto:kir...@wittenberg.edu>> wrote:

Hi all,

I've not done much with MySQL subqueries, and I'm trying right now with what I 
find to be surprising results. I wonder if someone can help me understand.

I have a pile of data that with columns for "institution" and "date". 
Institution gets repeated a lot, with many different dates. I want to select 
all the institutions that *only* have dates after July 1 and don't appear in 
the table before that. My solution was to do a first query for all the 
institutions that DO have dates before July 1
SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01'

And then to do a SELECT query on all the institutions:
SELECT distinct institution from renewals

And then try to do a NOT IN subquery subtracting the smaller query from the 
larger one:

SELECT distinct institution from renewals
WHERE institution not in
(SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01')

...only it doesn't seem to work. Or rather, the query has been running for 
several minutes and never comes back with an answer. Each of these two queries 
takes just a few milliseconds to run on its own.

Can someone tell me (a) am I just formatting the query wrong, (b) do subqueries 
like this just take forever, and/or (c) is there a better way to do this? (I 
don't really understand about JOIN queries, but from what I can tell they are 
only for mixing the results of two different tables so I think they might not 
apply here.)

Any advice would be most welcome.

Thanks
Ken


Re: [CODE4LIB] ny times best seller api

2011-09-29 Thread Godmar Back
On Wed, Sep 28, 2011 at 5:02 PM, Michael B. Klein  wrote:

>
> It's not NYTimes.com's fault; it's the cross-site scripting jerks who made
> the security necessary in the first place.
>
>
NYTimes could allow JSONP, but then developers would need to embed their API
key in their web pages, which means the API key would simply be a token used
for statistics, rather than for authentication. It's their choice that they
don't allow that.

Closer to the code4lib community: OCLC and Serials Solutions don't support
JSONP in their webservices, either, even though doing so would allow cool
services and would likely not affect their business models adversely in a
significant way, IMO. We should keep lobbying them to remove these
restrictions, as I've been doing for a while.

 - Godmar


Re: [CODE4LIB] mysql subquery response time

2011-09-29 Thread Chris Zagar
Ken,

For your performance issue, it would be interesting to take out the distinct on 
your subquery and see if that has any effect.  You definitely want distinct on 
the outer query to reduce the results, but on the subquery version, it may be 
preventing the query optimizer from taking some other path that may improve 
performance.

If I were doing this, I'd probably write it using a HAVING clause, which allows 
you to apply a condition against the value of one of the aggregate functions 
such as MIN and MAX.  With that, you should be able to use:

SELECT institution FROM renewals GROUP BY institution HAVING MIN(snap_date) >= 
'2011-07-01'

The "group by" reduces things to unique values, so no distinct is needed.  If 
you have any values before that date, they will be the minimum value, and those 
should screen right out.

Chris Zagar
Librarian
Estrella Mountain Community College

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ken 
Irwin
Sent: Wednesday, September 28, 2011 9:41 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] mysql subquery response time

Hi all,

I've not done much with MySQL subqueries, and I'm trying right now with what I 
find to be surprising results. I wonder if someone can help me understand.

I have a pile of data that with columns for "institution" and "date". 
Institution gets repeated a lot, with many different dates. I want to select 
all the institutions that *only* have dates after July 1 and don't appear in 
the table before that. My solution was to do a first query for all the 
institutions that DO have dates before July 1
SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01'

And then to do a SELECT query on all the institutions:
SELECT distinct institution from renewals

And then try to do a NOT IN subquery subtracting the smaller query from the 
larger one:

SELECT distinct institution from renewals
WHERE institution not in
(SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01')

...only it doesn't seem to work. Or rather, the query has been running for 
several minutes and never comes back with an answer. Each of these two queries 
takes just a few milliseconds to run on its own.

Can someone tell me (a) am I just formatting the query wrong, (b) do subqueries 
like this just take forever, and/or (c) is there a better way to do this? (I 
don't really understand about JOIN queries, but from what I can tell they are 
only for mixing the results of two different tables so I think they might not 
apply here.)

Any advice would be most welcome.

Thanks
Ken


Re: [CODE4LIB] mysql subquery response time

2011-09-29 Thread Cary Gordon
I must be missing something. Why wouldn't you just do:

SELECT distinct institution
FROM renewals
WHERE snap_date >= '2011-07-01'


On Wed, Sep 28, 2011 at 9:41 AM, Ken Irwin  wrote:
> Hi all,
>
> I've not done much with MySQL subqueries, and I'm trying right now with what 
> I find to be surprising results. I wonder if someone can help me understand.
>
> I have a pile of data that with columns for "institution" and "date". 
> Institution gets repeated a lot, with many different dates. I want to select 
> all the institutions that *only* have dates after July 1 and don't appear in 
> the table before that. My solution was to do a first query for all the 
> institutions that DO have dates before July 1
> SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01'
>
> And then to do a SELECT query on all the institutions:
> SELECT distinct institution from renewals
>
> And then try to do a NOT IN subquery subtracting the smaller query from the 
> larger one:
>
> SELECT distinct institution from renewals
> WHERE institution not in
> (SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01')
>
> ...only it doesn't seem to work. Or rather, the query has been running for 
> several minutes and never comes back with an answer. Each of these two 
> queries takes just a few milliseconds to run on its own.
>
> Can someone tell me (a) am I just formatting the query wrong, (b) do 
> subqueries like this just take forever, and/or (c) is there a better way to 
> do this? (I don't really understand about JOIN queries, but from what I can 
> tell they are only for mixing the results of two different tables so I think 
> they might not apply here.)
>
> Any advice would be most welcome.
>
> Thanks
> Ken
>



-- 
Cary Gordon
The Cherry Hill Company
http://chillco.com


Re: [CODE4LIB] ny times best seller api

2011-09-29 Thread Doug Yancey
I just got this set up on our site, using php to parse the json response.
 You should be able to see the result on our staging port at:

http://catalog.ccrls.org:2082

...under the 'Bestsellers' menu in the top nav bar.

We're having to host the actual php pages on a separate web server, since
our ILS server can't be made to support it.  Being a total php newb, I
received some great help from the stackoverflow forum to get the script
working; would be happy to share that code with you, if you think it would
help.

I've also tagged these pages for google analytics, so that we can monitor
the traffic on them.  I doubt we would come anywhere near the 5000/day api
limit, but wanted to be sure.  And I suppose if we did, I could always
introduce some server-side caching to cut the number down.

Regards,

Doug


On Wed, Sep 28, 2011 at 12:28 PM, Nate Hill  wrote:

> Anybody out there using the NY times best seller API to do stuff on their
> library websites?
> I can't figure out what's wrong with my code here.
> Data is returned as "null"; I can't seem to parse the response with jQuery.
> Any help would be supercool.
> I removed the API key - my code doesn't actually contain ''.
> Here's the jQuery:
>
> jQuery(document).ready(function(){
>$(function(){
>//json request to new york times
>$.getJSON('
>
> http://api.nytimes.com/svc/books/v2/lists/hardcover-fiction.json?&api-key=
> ',
>
>function(data) {
>//loop through the results with the following
> function
>$.each(data.results.book_details, function(i,item){
>//turn the title into a variable
>var bookTitle = item.title;
>$('#container').append(''+bookTitle+'');
>
>});
>});
>});
> });
>
>
> Here's a snippet of the JSON response:
>
> {
>"status": "OK",
>"copyright": "Copyright (c) 2011 The New York Times Company.  All Rights
> Reserved.",
>"num_results": 35,
>"last_modified": "2011-09-23T12:00:29-04:00",
>"results": [{
>"list_name": "Hardcover Fiction",
>"display_name": "Hardcover Fiction",
>"updated": "WEEKLY",
>"bestsellers_date": "2011-09-17",
>"published_date": "2011-10-02",
>"rank": 1,
>"rank_last_week": 0,
>"weeks_on_list": 1,
>"asterisk": 0,
>"dagger": 0,
>"isbns": [{
>"isbn10": "0399157786",
>"isbn13": "9780399157783"
>}],
>"book_details": [{
>"title": "NEW YORK TO DALLAS",
>"description": "An escaped child molester pursues Lt. Eve
> Dallas; by Nora Roberts, writing pseudonymously.",
>"contributor": "by J. D. Robb",
>"author": "J D Robb",
>"contributor_note": "",
>"price": 27.95,
>"age_group": "",
>"publisher": "Putnam",
>"primary_isbn13": "9780399157783",
>"primary_isbn10": "0399157786"
>}],
>"reviews": [{
>"book_review_link": "",
>"first_chapter_link": "",
>"sunday_review_link": "",
>"article_chapter_link": ""
>}]
>
>
> --
> Nate Hill
> nathanielh...@gmail.com
> http://www.natehill.net
>



-- 
Doug Yancey
Chemeketa Cooperative Regional Library Service
Web Services Librarian
503.315.4268
Need support from CCRLS? Open a ticket at http://swhelpdesk.ccrls.org/portal


Re: [CODE4LIB] mysql subquery response time

2011-09-29 Thread Dave Caroline
Sub queries are not well optimised till very recently therefore
rewrite subquery as a join for speed.
eg for A not in B the following

SELECT a.* FROM a LEFT JOIN b ON a.id = b.id WHERE b.id IS NULL;

also if you have two sets from the same table use derived tables and
then join them

(SELECT distinct institution FROM `renewals` WHERE snap_date <
'2011-07-01') as b
(SELECT distinct institution from renewals) as a

Dave Caroline


On Wed, Sep 28, 2011 at 5:41 PM, Ken Irwin  wrote:
> Hi all,
>
> I've not done much with MySQL subqueries, and I'm trying right now with what 
> I find to be surprising results. I wonder if someone can help me understand.
>
> I have a pile of data that with columns for "institution" and "date". 
> Institution gets repeated a lot, with many different dates. I want to select 
> all the institutions that *only* have dates after July 1 and don't appear in 
> the table before that. My solution was to do a first query for all the 
> institutions that DO have dates before July 1
> SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01'
>
> And then to do a SELECT query on all the institutions:
> SELECT distinct institution from renewals
>
> And then try to do a NOT IN subquery subtracting the smaller query from the 
> larger one:
>
> SELECT distinct institution from renewals
> WHERE institution not in
> (SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01')
>
> ...only it doesn't seem to work. Or rather, the query has been running for 
> several minutes and never comes back with an answer. Each of these two 
> queries takes just a few milliseconds to run on its own.
>
> Can someone tell me (a) am I just formatting the query wrong, (b) do 
> subqueries like this just take forever, and/or (c) is there a better way to 
> do this? (I don't really understand about JOIN queries, but from what I can 
> tell they are only for mixing the results of two different tables so I think 
> they might not apply here.)
>
> Any advice would be most welcome.
>
> Thanks
> Ken
>


Re: [CODE4LIB] mysql subquery response time

2011-09-29 Thread Hans Erik Büscher
Hi..

You could consider at least two things

1. drop the distinct keyword in the subquery. You dont need it
2. Use a EXISTS keyword instead of not in - check
http://dev.mysql.com/doc/refman/5.0/en/exists-and-not-exists-subqueries.html.
It is normally considered more effective


§;>heb
Hans Erik Büscher | Reindex
#45 7020 8434


2011/9/28 Ken Irwin 

> Hi all,
>
> I've not done much with MySQL subqueries, and I'm trying right now with
> what I find to be surprising results. I wonder if someone can help me
> understand.
>
> I have a pile of data that with columns for "institution" and "date".
> Institution gets repeated a lot, with many different dates. I want to select
> all the institutions that *only* have dates after July 1 and don't appear in
> the table before that. My solution was to do a first query for all the
> institutions that DO have dates before July 1
> SELECT distinct institution FROM `renewals` WHERE snap_date < '2011-07-01'
>
> And then to do a SELECT query on all the institutions:
> SELECT distinct institution from renewals
>
> And then try to do a NOT IN subquery subtracting the smaller query from the
> larger one:
>
> SELECT distinct institution from renewals
> WHERE institution not in
> (SELECT distinct institution FROM `renewals` WHERE snap_date <
> '2011-07-01')
>
> ...only it doesn't seem to work. Or rather, the query has been running for
> several minutes and never comes back with an answer. Each of these two
> queries takes just a few milliseconds to run on its own.
>
> Can someone tell me (a) am I just formatting the query wrong, (b) do
> subqueries like this just take forever, and/or (c) is there a better way to
> do this? (I don't really understand about JOIN queries, but from what I can
> tell they are only for mixing the results of two different tables so I think
> they might not apply here.)
>
> Any advice would be most welcome.
>
> Thanks
> Ken
>


Re: [CODE4LIB] mysql subquery response time

2011-09-29 Thread VM Brasseur

On 9/28/11 9:41 AM, Ken Irwin wrote:

SELECT distinct institution from renewals
WHERE institution not in
(SELECT distinct institution FROM `renewals` WHERE snap_date<  '2011-07-01')

...only it doesn't seem to work. Or rather, the query has been running for 
several minutes and never comes back with an answer. Each of these two queries 
takes just a few milliseconds to run on its own.

Can someone tell me (a) am I just formatting the query wrong, (b) do subqueries 
like this just take forever, and/or (c) is there a better way to do this? (I 
don't really understand about JOIN queries, but from what I can tell they are 
only for mixing the results of two different tables so I think they might not 
apply here.)


What are the indices on the table? Have you already tried running the 
query through EXPLAIN?


Also, subqueries can often be sluggish in MySQL. This article does a 
succinct but decent job of explaining: 
http://www.mysqlperformanceblog.com/2010/10/25/mysql-limitations-part-3-subqueries/


--VMB

--
Vicky Brasseur
Product Manager, Digital Archive Service
Internet Archive
http://archive.org
v...@archive.org