Re: [BioMart Users] Queries being cut off early with no warning...

Rhoda Kinsella Tue, 20 Sep 2011 07:23:08 -0700

Thanks Arek
Rhoda

On 20 Sep 2011, at 15:15, Arek Kasprzyk wrote:

Hi Rhoda,
yes there is:

http://database.oxfordjournals.org/content/2011/bar038.full?keytype=ref&ijkey=5Qv7xNnHDCNJP91
Syed designed and implemented the parallel query engine. We have notreally changed anything since then. I am sure he will be happy totalk to you abou this
a
On Tue, Sep 20, 2011 at 9:37 AM, Rhoda Kinsella <[email protected]>wrote:
Hi Arek
Thank you for all your suggestions. I will look again at our filtersand attributes and see what I can do to improve things for the nextfew releases before we move to 0.8. Is there some documentation onthe new 'parallel' and streaming query engine and the ICGCpartitioning solution so I can see what was involved? Or is theresomeone in particular I can contact about this if I need advice?
Regards
Rhoda

On 20 Sep 2011, at 14:28, Arek Kasprzyk wrote:
Hi Rhoda,
yes we are using partitioning a lot for the ICGC portal but thisrely on the fact that the datasets there lend themselves naturallyinto a partitioning solution ei different tumor types. The new'parallel' and streaming query engine thanks to Syed's work helpsgreat with that. For the variation you could use in the future asimilar solution and partition your datasets by chromosome. Thisseems to be quite natural as well.
For 0.7 I would strongly encourage you to try to figure which arethe 'killer' queries so we could look into that in more details andcome up with some sort of more targetted solution. As far asfilters are concerned i was talking mostly about 'default fliters'ei filters that could be switched on at all times e.g chromosomewithout a user being able to switch them off. I think MEditorprovides support for that. I know it is a crude solution but mabyewould help you a bit.
a
On Tue, Sep 20, 2011 at 9:08 AM, Rhoda Kinsella <[email protected]>wrote:
Hi Arek
The helpdesk team and I have worked together to try to help usersby making the same suggestions you mentioned in your email (i.eencouraging use of filters and limiting the number of attributesselected, using "download results via email" option etc..) and Ihave also implemented max select in several places in theconfiguration. I think we are going to have to look at streamliningthe data we provide in some way in the future. The issue is thatthe volume of data is growing, especially for variation and as thetables get bigger the queries take longer. I know that the load onthe server can sometimes be very high and that this affects userresponse times. Have you guys tried partitioning of data to improvebuild time and/or result response time and had any success with this?
Regards,
Rhoda

On 20 Sep 2011, at 13:21, Arek Kasprzyk wrote:
Hi Rhoda,
(cc'ing users because this can be of interest to others).
there is no active development on 0.7 anymore. However there arestill some 'generic' tricks you could use to improve your situation;
1. Ask people to go through 'download via email' route for moreheavy queries2. Limit attributes combination that results in many and heavytable joins via
a. using 'max select' when configuring mart
b. simply removing some atts
3. Using 'default' filters to limit the queries

However, i would start by checking two things:
1. Load on the server. The performance of the queries are hugelyaffected by that and this can be very misleading. If the load ishigh even very 'innnocent' queries take ages. If this is the caseperhaps you need more hardware?2. Type of the heavy queries that people do most often. If youcould tell me what they are perhaps we could come up with asolution that would target just those queries?
a
On Tue, Sep 20, 2011 at 5:47 AM, Rhoda Kinsella <[email protected]>wrote:
Hi Arek and Junjun
I have a query about BioMart and perhaps you can give me someadvice about how to solve this or whether something can be addedto the code to rectify it. Basically we are getting an increasingnumber of users reporting that they are only getting partialresult files or no result files back when they use biomart andthey are complaining that there was no warning or error message. Ihave asked our webteam about a cut off time that they have set forqueries to see if this has been changed. This was put in placesome time ago as some queries were taking too long and killing theservers or people kept resubmitting the same query over and overand this froze the servers for everyone else. I was wondering ifyou have implemented or are planning to implement some sort ofqueuing system for queries in the new code or would it be possibleto warn users if they have not got an incomplete file download. Ifear that some users are ploughing ahead with their work and notrealizing they are missing a chunk of the data. Is there a waythat we can automatically warn users that they are asking for toomuch data all at once and ask them to apply more filters? Is thereanything that I can do with our current 0.7 version to try to dealwith this issue? I'm worried people are going to start usingalternatives to Biomart if this continues. Any help or advicewould be greatly appreciated.
Regards
Rhoda


Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Hinxton
Cambridge CB10 1SD,
UK.
Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Hinxton
Cambridge CB10 1SD,
UK.
Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Hinxton
Cambridge CB10 1SD,
UK.


Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Hinxton
Cambridge CB10 1SD,
UK.

_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users

Re: [BioMart Users] Queries being cut off early with no warning...

Reply via email to