Kevin,
The best way forward is to try the email option from MartView. That will
do all the database retrieval and storage on server side and send you
the link to download the results when they are ready.
Syed
On 10/11/2011 21:50, Kevin C. Dorff wrote:
I've modified my script to fetch these one chromosome at a time, as you
mentioned.
My fear is that given that the splits are very unbalanced (some chromosomes are clearly
going to be much larger files than others) I'll still get timeouts. For instance, I am
currently transferring chromosome "X" and it is exhibiting the same stalling /
bursting behavior. 18 minutes so far and it's at 56.9MB (now at 22 minutes it is at
62.7MB) and just occasionally adding new data to the file but most of the time just
sitting there transferring nothing. This feels to me like there is some flaw in the
transfer system unless you've designed it to really throttle any transfers over a certain
size and are throttling very, very aggressively. I'll review the transfer output tomorrow
morning for timeouts, etc.
Kevin
On Thu, Nov 10, 2011 at 1:03 PM, Junjun
Zhang<[email protected]<mailto:[email protected]>> wrote:
Hi Kevin,
BioMart 0.7 does not work well for handling large/long running queries (snp
marts are large), recent high server load may have made things worse. There are
two options you can use to alleviate to situation.
1. Break the query down into multiple queries, say, one query per chromosome, you need to add a filter
like:<Filter name = "chr_name" value = "1"/> to you query. This way, you can
track the query more easily, and rerun the failed query separately.
2. Use the email notification option at martview web GUI (this is not
available for script driven queries).
Hope this helps, let us know how it goes.
Best regards,
Junjun
From: "Kevin C. Dorff"<[email protected]<mailto:[email protected]>>
Date: Thu, 10 Nov 2011 12:09:20 -0500
To:
"[email protected]<mailto:[email protected]>"<[email protected]<mailto:[email protected]>>
Subject: Re: [BioMart Users] Transfer very, very slow from martservice on
large-ish requests
Hi Arek,
Thanks for responding. I saw that biomart.org<http://biomart.org> was back up
so I tried again. I started my script and I am seeing the exact same effect as
before. It starts normally then after 10mb or so it will only periodically burst a
bit of data across. In 34 minutes I am up to 113M (now at 37 minutes I am at 118M but
most of the time is sent sending no data at all). This has been happening like this
for a little over a week, but I cannot speak to when it might have started because it
has been several months since I did these data transfers from martservice (this is
something I only do a few times a year). Looking back over my logs, these larger
files have timed out nearly every time, some around 5 hours, some at around more than
10.
If you could look into this for it, it would be greatly appreciated.
Thanks,
Kevin
On Thu, Nov 10, 2011 at 7:37 AM, Arek
Kasprzyk<[email protected]<mailto:[email protected]>> wrote:
Hi Kevin
there seem to be some problems with the service recently and now
biomart.org<http://biomart.org> is down. The OICR team are working to restore
the service. Once is restored please try again and let us know if you still are
experiencing those problems and we'll be able to look into it in more details
a
On Wed, Nov 9, 2011 at 11:59 AM, Kevin C.
Dorff<[email protected]<mailto:[email protected]>> wrote:
Hi,
I periodically grab annotations files in TSV format using martservice via XML. One
of the three files I transfer is relatively large (>500MB). It starts
transferring at a normal speed but before too far into the file (10MB or so?) the
transfer speed just bottoms out and then periodically bursts a little bit of data
at a time before stopping transfer for a while again. The transfer that previously
took maybe a couple hours now can take 10-20 hours, it seems, or worse, the
connection just times out and after 10+ hours of transferring data I end up with
an incomplete file.
I am using Curl to download the file. An example command line I would use that
exhibits the problem is
curl -o var-annotations-unsorted.tsv.body --tr-encoding --verbose -d @query.xml
http://www.biomart.org/biomart/martservice
Where query.xml contains the data (but the XML portion is URLEncoded per the
directions by Curl)
query=<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Query><Query virtualSchemaName="default"
formatter="TSV" header="0" un
iqueRows="0" count="" datasetConfigVersion="0.6"><Dataset name="mmusculus_snp" interface =
"default"><Attribute name="chr_
name"/><Attribute name="chrom_start"/><Attribute name="refsnp_id"/><Attribute
name="ensembl_gene_stable_id"/><Attribute name
="consequence_type_tv"/></Dataset></Query>
I've tried this from both my work network and my home network to verify it
wasn't an issue with our work network, and the same throttling behavior is
exhibited. I'd be somewhat less concerned of it taking 10+ hours to complete
the transfers didn't sometimes timeout after many hours of transfer.
Secondarily, I was hoping to speed up the transfer by providing the options
"--tr-encoding" or "--compressed" options in Curl, which would allow the server
to send the file over the wire as gzip, but it seems your server doesn't support this, which is too
bad because that could easily cut down the number of bytes transferred by a factor of 10 or more.
I've tried both options and neither seem to do anything with the martservice servers. Is there some
other option I could specify that would compress the data over the wire or before transfer? I can
handle nearly any file format on my side and would do nearly anything you offer to speed up these
transfers.
Any suggestions?
Kevin
_______________________________________________
Users mailing list
[email protected]<mailto:[email protected]>
https://lists.biomart.org/mailman/listinfo/users
_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users