Hi.

We've fixed our problem. We have in our MarcXML a field containing a
JSON timestamp describing the state of a record in the legacy ADS
system. This allows us to update only modified records. Such a field
looks like this:

{'refs': [{'p':
'/proj/ads/references/resolved/hep-lat/1991/9112002.raw.result', 't':
1316537233}], 'abs': [{'p':
'/proj/ads/abstracts/pre/text/X20/X20-07409.abs', 't': '1256173285'}],
'links': {'preprint': [{'u': 'hep-lat/9112002'}], 'spires': [{'u':
'http://www.slac.stanford.edu/spires/find/hep/www?rawcmd=find+eprint+hep-lat/9112002'}]},
'prop': ['arxiv_37', 'arxiv_45']}

The problem was that the index on the bibxxx tables that holds these
fields has the default length of 35 and the 35 characters of this
timestamp are always equal. So basically the index was useless and a
single query took between 0.5 and 1 second. Now, with an index with a
length of 200, the query takes less than a millisecond.

The bibupload speed with my light bibupload modification
(http://invenio-software.org/ticket/671) jumped from 10 records per
second to 100 records per second.

Cheers,
Benoit.

On Thu, Mar 15, 2012 at 11:56 AM, Giovanni Di Milia
<[email protected]> wrote:
> A small correction,
> Jay told me that the file specified in "log=/tmp/mysql-query.log" is
> the file of the global logging of MySQL and this is the reason I see
> all the queries.
> I disabled it but it didn't help with the performances.
>
> We also noted that disabling the population of the bibXXX tables
> doesn't effect the speed of bibupload: now Benoit is looking into this
> to understand why.
>
> Thanks,
> Giovanni
>
> On Thu, Mar 15, 2012 at 10:48 AM, Giovanni Di Milia
> <[email protected]> wrote:
>> Hi Samuele,
>> I have a lot of queries logged in that file.
>> An example is here http://pastebin.com/NXujzvgn
>> but the file grows really fast and this tells me that almost every
>> query is slow.
>>
>> This is the output of the command you mentioned in your email:
>>
>> echo "SHOW PROCESSLIST" | /proj.adsx/invenio/bin/dbexec
>> Id      User    Host    db      Command Time    State   Info
>> 2       invenio localhost       invenioauthdis  Sleep   0               NULL
>> 3       invenio localhost       invenioauthdis  Query   0       Sending data 
>>    SELECT
>> id,value FROM bib99x  WHERE tag='995__a' AND value='{\\'refs\\':
>> [{\\'p\\': \\'/proj/ads/reference
>> 4       invenio localhost       invenioauthdis  Query   0       NULL    SHOW 
>> PROCESSLIST
>>
>> Thanks,
>> Giovanni
>>
>>
>> On Thu, Mar 15, 2012 at 10:18 AM, Samuele Kaplun <[email protected]> 
>> wrote:
>>> Hi Benoit and Giovanni,
>>>
>>> In data giovedě, 15 marzo 2012 09.55:23, Benoit Thiell ha scritto:
>>>> (Sent on behalf of Giovanni <[email protected]> who encountered
>>>> problems with the list.)
>>>
>>> Interesting: what error has he received? Can you in case just send it 
>>> private
>>> mail?
>>>
>>>> The bottleneck seems to be MySQL, since in "top" I see:
>>>> 29532 mysql     15   0 29.3g 5.1g 4040 S 97.8  3.6  11:49.56 mysqld
>>>> (98% of CPU usage)
>>>
>>> I see you have enabled
>>>
>>> [...]
>>> log-slow-queries
>>> log=/tmp/mysql-query.log
>>> [...]
>>>
>>> any hint from this log?
>>>
>>> And what is a typical output when executing:
>>>
>>> $ echo "SHOW PROCESSLIST" | /opt/invenio/bin/dbexec
>>>
>>> Is it showing some query lasting a lot or some LOCKED one?
>>>
>>> Cheers!
>>>        Sam
>>>
>>> --
>>> Samuele Kaplun
>>> Invenio Developer ** <http://invenio-software.org/>
>>>



-- 
Benoit Thiell
The SAO/NASA Astrophysics Data System
http://adswww.harvard.edu/

Reply via email to