Re: [BioMart Users] Queries being cut off early with no warning...

Rhoda Kinsella Tue, 20 Sep 2011 06:37:55 -0700

Hi Arek

Thank you for all your suggestions. I will look again at our filtersand attributes and see what I can do to improve things for the nextfew releases before we move to 0.8. Is there some documentation on thenew 'parallel' and streaming query engine and the ICGC partitioningsolution so I can see what was involved? Or is there someone inparticular I can contact about this if I need advice?

Regards
Rhoda


On 20 Sep 2011, at 14:28, Arek Kasprzyk wrote:

Hi Rhoda,
yes we are using partitioning a lot for the ICGC portal but thisrely on the fact that the datasets there lend themselves naturallyinto a partitioning solution ei different tumor types. The new'parallel' and streaming query engine thanks to Syed's work helpsgreat with that. For the variation you could use in the future asimilar solution and partition your datasets by chromosome. Thisseems to be quite natural as well.
For 0.7 I would strongly encourage you to try to figure which arethe 'killer' queries so we could look into that in more details andcome up with some sort of more targetted solution. As far as filtersare concerned i was talking mostly about 'default fliters' eifilters that could be switched on at all times e.g chromosomewithout a user being able to switch them off. I think MEditorprovides support for that. I know it is a crude solution but mabyewould help you a bit.
a
On Tue, Sep 20, 2011 at 9:08 AM, Rhoda Kinsella <[email protected]>wrote:
Hi Arek
The helpdesk team and I have worked together to try to help users bymaking the same suggestions you mentioned in your email (i.eencouraging use of filters and limiting the number of attributesselected, using "download results via email" option etc..) and Ihave also implemented max select in several places in theconfiguration. I think we are going to have to look at streamliningthe data we provide in some way in the future. The issue is that thevolume of data is growing, especially for variation and as thetables get bigger the queries take longer. I know that the load onthe server can sometimes be very high and that this affects userresponse times. Have you guys tried partitioning of data to improvebuild time and/or result response time and had any success with this?
Regards,
Rhoda

On 20 Sep 2011, at 13:21, Arek Kasprzyk wrote:
Hi Rhoda,
(cc'ing users because this can be of interest to others).
there is no active development on 0.7 anymore. However there arestill some 'generic' tricks you could use to improve your situation;
1. Ask people to go through 'download via email' route for moreheavy queries2. Limit attributes combination that results in many and heavytable joins via
a. using 'max select' when configuring mart
b. simply removing some atts
3. Using 'default' filters to limit the queries

However, i would start by checking two things:
1. Load on the server. The performance of the queries are hugelyaffected by that and this can be very misleading. If the load ishigh even very 'innnocent' queries take ages. If this is the caseperhaps you need more hardware?2. Type of the heavy queries that people do most often. If youcould tell me what they are perhaps we could come up with asolution that would target just those queries?
a
On Tue, Sep 20, 2011 at 5:47 AM, Rhoda Kinsella <[email protected]>wrote:
Hi Arek and Junjun
I have a query about BioMart and perhaps you can give me someadvice about how to solve this or whether something can be added tothe code to rectify it. Basically we are getting an increasingnumber of users reporting that they are only getting partial resultfiles or no result files back when they use biomart and they arecomplaining that there was no warning or error message. I haveasked our webteam about a cut off time that they have set forqueries to see if this has been changed. This was put in place sometime ago as some queries were taking too long and killing theservers or people kept resubmitting the same query over and overand this froze the servers for everyone else. I was wondering ifyou have implemented or are planning to implement some sort ofqueuing system for queries in the new code or would it be possibleto warn users if they have not got an incomplete file download. Ifear that some users are ploughing ahead with their work and notrealizing they are missing a chunk of the data. Is there a way thatwe can automatically warn users that they are asking for too muchdata all at once and ask them to apply more filters? Is thereanything that I can do with our current 0.7 version to try to dealwith this issue? I'm worried people are going to start usingalternatives to Biomart if this continues. Any help or advice wouldbe greatly appreciated.
Regards
Rhoda


Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Hinxton
Cambridge CB10 1SD,
UK.
Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Hinxton
Cambridge CB10 1SD,
UK.


Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Hinxton
Cambridge CB10 1SD,
UK.

_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users

Re: [BioMart Users] Queries being cut off early with no warning...

Reply via email to