Hi Rhoda,
yes there is:

http://database.oxfordjournals.org/content/2011/bar038.full?keytype=ref&ijkey=5Qv7xNnHDCNJP91

Syed designed and implemented the parallel query engine. We have not really
changed anything since then. I am sure he will be happy to talk to you abou
this


a


On Tue, Sep 20, 2011 at 9:37 AM, Rhoda Kinsella <[email protected]> wrote:

> Hi Arek
> Thank you for all your suggestions. I will look again at our filters and
> attributes and see what I can do to improve things for the next few releases
> before we move to 0.8. Is there some documentation on the new 'parallel' and
> streaming query engine and the ICGC partitioning solution so I can see what
> was involved? Or is there someone in particular I can contact about this if
> I need advice?
> Regards
> Rhoda
>
> On 20 Sep 2011, at 14:28, Arek Kasprzyk wrote:
>
> Hi Rhoda,
> yes we are using partitioning a lot for the ICGC portal but this rely on
> the fact that the datasets there lend themselves naturally into a
> partitioning solution ei different tumor types. The new 'parallel' and
> streaming query engine thanks to Syed's work helps great with that. For the
> variation you could use in the future a similar solution and partition your
> datasets by chromosome. This seems to be quite natural as well.
>
> For 0.7 I would strongly encourage you to try to figure which are the
> 'killer' queries so we could look into that in more details and come up with
> some sort of more targetted solution. As far as filters are concerned i was
> talking mostly about 'default fliters' ei filters that could be switched on
> at all times e.g chromosome without a user being able to switch them off. I
> think MEditor provides support for that. I know it is a crude solution but
> mabye would help you a bit.
>
> a
>
>
> On Tue, Sep 20, 2011 at 9:08 AM, Rhoda Kinsella <[email protected]> wrote:
>
>> Hi Arek
>> The helpdesk team and I have worked together to try to help users by
>> making the same suggestions you mentioned in your email (i.e encouraging use
>> of filters and limiting the number of attributes selected, using "download
>> results via email" option etc..) and I have also implemented max select in
>> several places in the configuration. I think we are going to have to look at
>> streamlining the data we provide in some way in the future. The issue is
>> that the volume of data is growing, especially for variation and as the
>> tables get bigger the queries take longer. I know that the load on the
>> server can sometimes be very high and that this affects user response times.
>> Have you guys tried partitioning of data to improve build time and/or result
>> response time and had any success with this?
>> Regards,
>> Rhoda
>>
>> On 20 Sep 2011, at 13:21, Arek Kasprzyk wrote:
>>
>> Hi Rhoda,
>> (cc'ing users because this can be of interest to others).
>> there is no active development on 0.7 anymore. However there are still
>> some 'generic' tricks you could use to improve your situation;
>>
>> 1. Ask people to go through 'download via email' route for more heavy
>> queries
>> 2. Limit attributes combination that results in many and heavy table joins
>> via
>> a. using 'max select' when configuring mart
>> b. simply removing some atts
>> 3. Using 'default' filters to limit the queries
>>
>> However, i would start by checking two things:
>>
>> 1. Load on the server. The performance of the queries are hugely affected
>> by that and this can be very misleading. If the load is high even very
>> 'innnocent' queries take ages. If this is the case perhaps you need more
>> hardware?
>> 2. Type of the heavy queries that people do most often. If you could tell
>> me what they are perhaps we could come up with a solution that would target
>> just those queries?
>>
>>
>>
>> a
>>
>>
>>
>>
>>
>> On Tue, Sep 20, 2011 at 5:47 AM, Rhoda Kinsella <[email protected]> wrote:
>>
>>> Hi Arek and Junjun
>>> I have a query about BioMart and perhaps you can give me some advice
>>> about how to solve this or whether something can be added to the code to
>>> rectify it. Basically we are getting an increasing number of users reporting
>>> that they are only getting partial result files or no result files back when
>>> they use biomart and they are complaining that there was no warning or error
>>> message. I have asked our webteam about a cut off time that they have set
>>> for queries to see if this has been changed. This was put in place some time
>>> ago as some queries were taking too long and killing the servers or people
>>> kept resubmitting the same query over and over and this froze the servers
>>> for everyone else. I was wondering if you have implemented or are planning
>>> to implement some sort of queuing system for queries in the new code or
>>> would it be possible to warn users if they have not got an incomplete file
>>> download. I fear that some users are ploughing ahead with their work and not
>>> realizing they are missing a chunk of the data. Is there a way that we can
>>> automatically warn users that they are asking for too much data all at once
>>> and ask them to apply more filters? Is there anything that I can do with our
>>> current 0.7 version to try to deal with this issue? I'm worried people are
>>> going to start using alternatives to Biomart if this continues. Any help or
>>> advice would be greatly appreciated.
>>> Regards
>>> Rhoda
>>>
>>>
>>> Rhoda Kinsella Ph.D.
>>> Ensembl Bioinformatician,
>>> European Bioinformatics Institute (EMBL-EBI),
>>> Wellcome Trust Genome Campus,
>>> Hinxton
>>> Cambridge CB10 1SD,
>>> UK.
>>>
>>>
>>
>>  Rhoda Kinsella Ph.D.
>> Ensembl Bioinformatician,
>> European Bioinformatics Institute (EMBL-EBI),
>> Wellcome Trust Genome Campus,
>> Hinxton
>> Cambridge CB10 1SD,
>> UK.
>>
>>
>
> Rhoda Kinsella Ph.D.
> Ensembl Bioinformatician,
> European Bioinformatics Institute (EMBL-EBI),
> Wellcome Trust Genome Campus,
> Hinxton
> Cambridge CB10 1SD,
> UK.
>
>
_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users

Reply via email to