Hi Rhoda, yes there is: http://database.oxfordjournals.org/content/2011/bar038.full?keytype=ref&ijkey=5Qv7xNnHDCNJP91
Syed designed and implemented the parallel query engine. We have not really changed anything since then. I am sure he will be happy to talk to you abou this a On Tue, Sep 20, 2011 at 9:37 AM, Rhoda Kinsella <[email protected]> wrote: > Hi Arek > Thank you for all your suggestions. I will look again at our filters and > attributes and see what I can do to improve things for the next few releases > before we move to 0.8. Is there some documentation on the new 'parallel' and > streaming query engine and the ICGC partitioning solution so I can see what > was involved? Or is there someone in particular I can contact about this if > I need advice? > Regards > Rhoda > > On 20 Sep 2011, at 14:28, Arek Kasprzyk wrote: > > Hi Rhoda, > yes we are using partitioning a lot for the ICGC portal but this rely on > the fact that the datasets there lend themselves naturally into a > partitioning solution ei different tumor types. The new 'parallel' and > streaming query engine thanks to Syed's work helps great with that. For the > variation you could use in the future a similar solution and partition your > datasets by chromosome. This seems to be quite natural as well. > > For 0.7 I would strongly encourage you to try to figure which are the > 'killer' queries so we could look into that in more details and come up with > some sort of more targetted solution. As far as filters are concerned i was > talking mostly about 'default fliters' ei filters that could be switched on > at all times e.g chromosome without a user being able to switch them off. I > think MEditor provides support for that. I know it is a crude solution but > mabye would help you a bit. > > a > > > On Tue, Sep 20, 2011 at 9:08 AM, Rhoda Kinsella <[email protected]> wrote: > >> Hi Arek >> The helpdesk team and I have worked together to try to help users by >> making the same suggestions you mentioned in your email (i.e encouraging use >> of filters and limiting the number of attributes selected, using "download >> results via email" option etc..) and I have also implemented max select in >> several places in the configuration. I think we are going to have to look at >> streamlining the data we provide in some way in the future. The issue is >> that the volume of data is growing, especially for variation and as the >> tables get bigger the queries take longer. I know that the load on the >> server can sometimes be very high and that this affects user response times. >> Have you guys tried partitioning of data to improve build time and/or result >> response time and had any success with this? >> Regards, >> Rhoda >> >> On 20 Sep 2011, at 13:21, Arek Kasprzyk wrote: >> >> Hi Rhoda, >> (cc'ing users because this can be of interest to others). >> there is no active development on 0.7 anymore. However there are still >> some 'generic' tricks you could use to improve your situation; >> >> 1. Ask people to go through 'download via email' route for more heavy >> queries >> 2. Limit attributes combination that results in many and heavy table joins >> via >> a. using 'max select' when configuring mart >> b. simply removing some atts >> 3. Using 'default' filters to limit the queries >> >> However, i would start by checking two things: >> >> 1. Load on the server. The performance of the queries are hugely affected >> by that and this can be very misleading. If the load is high even very >> 'innnocent' queries take ages. If this is the case perhaps you need more >> hardware? >> 2. Type of the heavy queries that people do most often. If you could tell >> me what they are perhaps we could come up with a solution that would target >> just those queries? >> >> >> >> a >> >> >> >> >> >> On Tue, Sep 20, 2011 at 5:47 AM, Rhoda Kinsella <[email protected]> wrote: >> >>> Hi Arek and Junjun >>> I have a query about BioMart and perhaps you can give me some advice >>> about how to solve this or whether something can be added to the code to >>> rectify it. Basically we are getting an increasing number of users reporting >>> that they are only getting partial result files or no result files back when >>> they use biomart and they are complaining that there was no warning or error >>> message. I have asked our webteam about a cut off time that they have set >>> for queries to see if this has been changed. This was put in place some time >>> ago as some queries were taking too long and killing the servers or people >>> kept resubmitting the same query over and over and this froze the servers >>> for everyone else. I was wondering if you have implemented or are planning >>> to implement some sort of queuing system for queries in the new code or >>> would it be possible to warn users if they have not got an incomplete file >>> download. I fear that some users are ploughing ahead with their work and not >>> realizing they are missing a chunk of the data. Is there a way that we can >>> automatically warn users that they are asking for too much data all at once >>> and ask them to apply more filters? Is there anything that I can do with our >>> current 0.7 version to try to deal with this issue? I'm worried people are >>> going to start using alternatives to Biomart if this continues. Any help or >>> advice would be greatly appreciated. >>> Regards >>> Rhoda >>> >>> >>> Rhoda Kinsella Ph.D. >>> Ensembl Bioinformatician, >>> European Bioinformatics Institute (EMBL-EBI), >>> Wellcome Trust Genome Campus, >>> Hinxton >>> Cambridge CB10 1SD, >>> UK. >>> >>> >> >> Rhoda Kinsella Ph.D. >> Ensembl Bioinformatician, >> European Bioinformatics Institute (EMBL-EBI), >> Wellcome Trust Genome Campus, >> Hinxton >> Cambridge CB10 1SD, >> UK. >> >> > > Rhoda Kinsella Ph.D. > Ensembl Bioinformatician, > European Bioinformatics Institute (EMBL-EBI), > Wellcome Trust Genome Campus, > Hinxton > Cambridge CB10 1SD, > UK. > >
_______________________________________________ Users mailing list [email protected] https://lists.biomart.org/mailman/listinfo/users
