Re: [galaxy-dev] A tool with no inputs

2011-05-17 Thread Paul-Michael Agapow
 One of my colleagues is having trouble developing a peculiar tool: it
has no inputs. This makes sense in our local
 context - it fetches some constantly updating remote data for the
current user - but implementing it has escaped our
 skill. Galaxy complains about a tool with no params (i.e. an empty
inputs). It complains about a tool with no
 inputs tag. Hidden input fields don't seem to work (i.e. looks like
I'm getting the cached value of the form).

 From: Duddy, John jdu...@illumina.com
 
 Doesn't this violate one of the basic tenets of Galaxy -
reproducibility? Without the ability to provide full
 traceability to the inputs, one can make no guarantees about the
outputs.

I think that condition is being met - the resulting dataset depends on
no inputs but the time of fetching, and so traceability is satisfied,
just like many remote and updating data sources. In any event, our
pragmatic need in this case is for such a tool without any need for
reproducibility. 

Which leads me to note the strange behaviour seen when the only input is
a hidden field. When the inputs are:

inputs
param name=input_seq_file type=data format=fasta
label=Input sequences
help=The sequences to have MLST profiles
constructed for.
/
param name=offset type=hidden value=0 /
/inputs

this works as expected: a dataset selection dropdown and a hidden field
(seen in the source). But if you delete the non-hidden parameter:

inputs
param name=offset type=hidden value=0 /
/inputs

Galaxy doesn't generate an error, reports loading the tool, and puts an
entry in the tool panel. However the link for the tool in question is
malformed, missing a tool identifier and when clicked only reloads the
middle welcome panel. 

I haven't worked out where and why this is happening, but at the least
it's an uninformative error.


Paul Agapow (paul-michael.aga...@hpa.org.uk)
Bioinformatics, Centre for Infections, Health Protection Agency
-
**
The information contained in the EMail and any attachments is
confidential and intended solely and for the attention and use of
the named addressee(s). It may not be disclosed to any other person
without the express authority of the HPA, or the intended
recipient, or both. If you are not the intended recipient, you must
not disclose, copy, distribute or retain this message or any part
of it. This footnote also confirms that this EMail has been swept
for computer viruses, but please re-sweep any attachments before
opening or saving. HTTP://www.HPA.org.uk
**

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] User list and disk space?

2011-05-17 Thread Louise-Amelie Schmitt

Thank you so much!!!

It works really fine! I suspected something akin to that but I couldn't 
find the right attributes.


But I still have a question: It comprises the datasets stored in 
libraries right? Is there a way to ignore them?


Thanks again,
L-A


Greg Von Kuster wrote:

Hello Louise,

I've pasted a diff below for the file 
~/lib/galaxy/web/controller/admin.py that will provide what you want, 
I believe.  I didn't have time to fully test, but it gives you the 
idea.  You can make things prettier by wrapping the returned value 
in galaxy.datatypes.data.nice_size


diff -r 56be3f4871cd lib/galaxy/web/controllers/admin.py
--- a/lib/galaxy/web/controllers/admin.py Fri May 13 21:24:03 2011 -0400
+++ b/lib/galaxy/web/controllers/admin.py Mon May 16 14:15:23 2011 -0400
@@ -41,6 +41,13 @@
 if user.galaxy_sessions:
 return self.format( user.galaxy_sessions[ 0 
].update_time )

 return 'never'
+class DiskUseageColumn( grids.GridColumn ):
+def get_value( self, trans, grid, user ):
+disk_used = 0
+for history in user.active_histories:
+for hda in history.active_datasets:
+disk_used += hda.get_size()
+return disk_used
 
 # Grid definition

 webapp = galaxy
@@ -65,6 +72,7 @@
 ExternalColumn( External, attach_popup=False ),
 LastLoginColumn( Last Login, format=time_ago ),
 StatusColumn( Status, attach_popup=False ),
+DiskUseageColumn( Disk Used, attach_popup=False ),
 # Columns that are valid for filtering but are not visible.
 grids.DeletedColumn( Deleted, key=deleted, visible=False, 
filterable=advanced )

 ]


On May 16, 2011, at 10:18 AM, Louise-Amelie Schmitt wrote:


Hello

I would like to add a column in the admin panel's user list with the 
sum of the file size of every dataset in each user's histories that 
don't belong to any ibrary.


Is there a way to do that?

I had a look at lib/galaxy/web/controller/admin.py and 
templates/grid_base.mako but I fail to see where the query variable 
(containing the items, therefore the users) come from in the mako 
template, so I didn't figure out what each column class in admin.py 
actually manipulates in the get_value() methods.


Regards,
L-A
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/


Greg Von Kuster
Galaxy Development Team
g...@bx.psu.edu mailto:g...@bx.psu.edu





___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] User list and disk space?

2011-05-17 Thread Greg Von Kuster
Louise,

The original code would not eliminate a dataset file that was pointed to by a 
LibraryDatasetDatasetAssociation, and then imported from the data library into 
the user's history, creating a HistoryDatasetAssociation that points to the 
same file.  I've added a bit of code below that should eliminate datasets 
falling into this category - not tested at all though...

On May 17, 2011, at 8:42 AM, Louise-Amelie Schmitt wrote:

 
 
 But I still have a question: It comprises the datasets stored in libraries 
 right? Is there a way to ignore them?
 
 
 Greg Von Kuster wrote:
 Hello Louise,
 
 I've pasted a diff below for the file ~/lib/galaxy/web/controller/admin.py 
 that will provide what you want, I believe.  I didn't have time to fully 
 test, but it gives you the idea.  You can make things prettier by wrapping 
 the returned value in galaxy.datatypes.data.nice_size
 
 diff -r 56be3f4871cd lib/galaxy/web/controllers/admin.py
 --- a/lib/galaxy/web/controllers/admin.py Fri May 13 21:24:03 2011 -0400
 +++ b/lib/galaxy/web/controllers/admin.py Mon May 16 14:15:23 2011 -0400
 @@ -41,6 +41,13 @@
 if user.galaxy_sessions:
 return self.format( user.galaxy_sessions[ 0 ].update_time )
 return 'never'
 +class DiskUseageColumn( grids.GridColumn ):
 +def get_value( self, trans, grid, user ):
 +disk_used = 0
 +for history in user.active_histories:
 +for hda in history.active_datasets:

 dataset = hda.dataset
 if not dataset.active_library_associations:
  +disk_used += hda.get_size()


 +return disk_used
  # Grid definition
 webapp = galaxy
 @@ -65,6 +72,7 @@
 ExternalColumn( External, attach_popup=False ),
 LastLoginColumn( Last Login, format=time_ago ),
 StatusColumn( Status, attach_popup=False ),
 +DiskUseageColumn( Disk Used, attach_popup=False ),
 # Columns that are valid for filtering but are not visible.
 grids.DeletedColumn( Deleted, key=deleted, visible=False, 
 filterable=advanced )
 ]
 
 
 On May 16, 2011, at 10:18 AM, Louise-Amelie Schmitt wrote:
 
 Hello
 
 I would like to add a column in the admin panel's user list with the sum of 
 the file size of every dataset in each user's histories that don't belong 
 to any ibrary.
 
 Is there a way to do that?
 
 I had a look at lib/galaxy/web/controller/admin.py and 
 templates/grid_base.mako but I fail to see where the query variable 
 (containing the items, therefore the users) come from in the mako template, 
 so I didn't figure out what each column class in admin.py actually 
 manipulates in the get_value() methods.
 
 Regards,
 L-A
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
 http://lists.bx.psu.edu/
 
 Greg Von Kuster
 Galaxy Development Team
 g...@bx.psu.edu mailto:g...@bx.psu.edu
 
 
 
 

Greg Von Kuster
Galaxy Development Team
g...@bx.psu.edu




___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] User list and disk space?

2011-05-17 Thread Greg Von Kuster
Possibly, but it would require an enable flag in the config since instances 
with hundreds of users could pose problems with rendering delays.  If someone 
will proved the precise specs for what yo9u want, I'll add it to my list, but 
it will be awhile before I get to ti.   For example, should dataset files that 
have both LibraryDatasetDatasetAssociations and HistoryDatasetAssociations be 
eliminated from the total?


On May 17, 2011, at 9:10 AM, Peter Cock wrote:

 On May 16, 2011, at 10:18 AM, Louise-Amelie Schmitt wrote:
 
 Hello
 
 I would like to add a column in the admin panel's user list with the sum of
 the file size of every dataset in each user's histories that don't belong to
 any ibrary.
 
 Is there a way to do that?
 
 On Mon, May 16, 2011 at 7:18 PM, Greg Von Kuster g...@bx.psu.edu wrote:
 Hello Louise,
 I've pasted a diff below for the file ~/lib/galaxy/web/controller/admin.py
 that will provide what you want, I believe.  I didn't have time to fully
 test, but it gives you the idea.  You can make things prettier by wrapping
 the returned value in galaxy.datatypes.data.nice_size
 
 That sounds really useful - it is something that you plan on adding
 to Galaxy officially?
 
 Thanks,
 
 Peter
 

Greg Von Kuster
Galaxy Development Team
g...@bx.psu.edu




___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] User list and disk space?

2011-05-17 Thread Nate Coraor
Peter Cock wrote:
 
 That sounds really useful - it is something that you plan on adding
 to Galaxy officially?

Yes, I'm working on adding a lot of user- and admin- side access to
various numbers about disk usage (used by histories, used by deleted
data, etc. and disk quotas.  I hope to have this done shortly after the
GCC.

Thanks,
--nate

 
 Thanks,
 
 Peter
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
   http://lists.bx.psu.edu/
 
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] User list and disk space?

2011-05-17 Thread Louise-Amelie Schmitt

Looks like it's working,  a couple of values were reduced :)

Thank you so much!!!
L-A


Greg Von Kuster wrote:

Louise,

The original code would not eliminate a dataset file that was pointed to by a 
LibraryDatasetDatasetAssociation, and then imported from the data library into 
the user's history, creating a HistoryDatasetAssociation that points to the 
same file.  I've added a bit of code below that should eliminate datasets 
falling into this category - not tested at all though...

On May 17, 2011, at 8:42 AM, Louise-Amelie Schmitt wrote:

  

But I still have a question: It comprises the datasets stored in libraries 
right? Is there a way to ignore them?


Greg Von Kuster wrote:


Hello Louise,

I've pasted a diff below for the file ~/lib/galaxy/web/controller/admin.py that 
will provide what you want, I believe.  I didn't have time to fully test, but 
it gives you the idea.  You can make things prettier by wrapping the returned 
value in galaxy.datatypes.data.nice_size

diff -r 56be3f4871cd lib/galaxy/web/controllers/admin.py
--- a/lib/galaxy/web/controllers/admin.py Fri May 13 21:24:03 2011 -0400
+++ b/lib/galaxy/web/controllers/admin.py Mon May 16 14:15:23 2011 -0400
@@ -41,6 +41,13 @@
if user.galaxy_sessions:
return self.format( user.galaxy_sessions[ 0 ].update_time )
return 'never'
+class DiskUseageColumn( grids.GridColumn ):
+def get_value( self, trans, grid, user ):
+disk_used = 0
+for history in user.active_histories:
+for hda in history.active_datasets:
  


 dataset = hda.dataset
 if not dataset.active_library_associations:
  

+disk_used += hda.get_size()
  



  

+return disk_used
 # Grid definition
webapp = galaxy
@@ -65,6 +72,7 @@
ExternalColumn( External, attach_popup=False ),
LastLoginColumn( Last Login, format=time_ago ),
StatusColumn( Status, attach_popup=False ),
+DiskUseageColumn( Disk Used, attach_popup=False ),
# Columns that are valid for filtering but are not visible.
grids.DeletedColumn( Deleted, key=deleted, visible=False, 
filterable=advanced )
]


On May 16, 2011, at 10:18 AM, Louise-Amelie Schmitt wrote:

  

Hello

I would like to add a column in the admin panel's user list with the sum of the 
file size of every dataset in each user's histories that don't belong to any 
ibrary.

Is there a way to do that?

I had a look at lib/galaxy/web/controller/admin.py and templates/grid_base.mako but I 
fail to see where the query variable (containing the items, therefore the 
users) come from in the mako template, so I didn't figure out what each column class in 
admin.py actually manipulates in the get_value() methods.

Regards,
L-A
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/


Greg Von Kuster
Galaxy Development Team
g...@bx.psu.edu mailto:g...@bx.psu.edu



  


Greg Von Kuster
Galaxy Development Team
g...@bx.psu.edu



  


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


[galaxy-dev] Database/Build

2011-05-17 Thread Dave Walton
I'd like to get a better understanding of the point of the database/build
attribute, and pose the question of when is the appropriate time to have it
set?

In our case at the Jackson Laboratory, the most common build is  NCBI37/MM9.

However, the feeling of many folks here, is that this should not be set on
our fastq files.  The only place we really run into trouble is with
cufflinks.  If you haven't set the db when you get to cufflinks you'll get
an error.

Our suggestion is that there should be one of two options:

1)  Tophat has the ability to set the database of the output files based on
the genome that was selected for alignment.

2)  There should be a module that can be plugged into a workflow that would
set the database of the file prior to passing the file to cufflinks (or any
other two that requires the database attribute to be set).

We are curious if anyone else is running into this issue, and how it is
being solved.

We're thinking about hacking the Tophat wrapper, but I wanted to check with
others before I did this.

Thanks,

Dave


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] galaxy doesn't start with drmaa job runner

2011-05-17 Thread Shantanu Pavgi


I am trying to configure galaxy with sge/drmaa scheduler. The galaxy process is 
starting up fine without any drmaa configuration. However the galaxy daemon 
doesn't start properly when I add drmaa configuration lines as shown below. I 
have set SGE_ROOT and DRMAA_LIBRARY_PATH variables properly. Am I missing 
something important here?


# drmaa configuration: 
{{{
start_job_runners = drmaa
default_cluster_job_runner = drmaa:///
}}


# paster.log file - with drmaa config the process dies with Loaded job runner: 
galaxy.jobs.runners.local:LocalJobRunner as last line in the log file. With 
local job runner the galaxy starts fine as shown below: 

{{{
galaxy.jobs DEBUG 2011-05-17 09:59:58,959 Loaded job runner: 
galaxy.jobs.runners.local:LocalJobRunner
galaxy.jobs INFO 2011-05-17 09:59:58,959 job manager started
galaxy.jobs INFO 2011-05-17 09:59:59,325 job stopper started
galaxy.sample_tracking.external_service_types DEBUG 2011-05-17 09:59:59,330 
Loaded external_service_type: Simple unknown sequencer 1.0.0
galaxy.sample_tracking.external_service_types DEBUG 2011-05-17 09:59:59,333 
Loaded external_service_type: Applied Biosystems SOLiD 1.0.0
galaxy.web.framework.base DEBUG 2011-05-17 09:59:59,349 Enabling 'mobile' 
controller, class: Mobile
galaxy.web.framework.base DEBUG 2011-05-17 09:59:59,417 Enabling 
'library_common' controller, class: LibraryCommon
galaxy.web.framework.base DEBUG 2011-05-17 09:59:59,435 Enabling 'admin' 
controller, class: AdminGalaxy
galaxy.web.framework.base DEBUG 2011-05-17 09:59:59,451 Enabling 'requests' 
controller, class: Requests
galaxy.web.framework.base DEBUG 2011-05-17 09:59:59,455 Enabling 
'external_services' controller, class: ExternalServiceController
galaxy.web.framework.base DEBUG 2011-05-17 09:59:59,474 Enabling 'page' 
controller, class: PageController
galaxy.web.framework.base DEBUG 2011-05-17 09:59:59,480 Enabling 
'visualization' controller, class: VisualizationController
galaxy.web.framework.base DEBUG 2011-05-17 09:59:59,549 Enabling 'tracks' 
controller, class: TracksController
galaxy.web.framework.base DEBUG 2011-05-17 09:59:59,550 Enabling 
'requests_common' controller, class: RequestsCommon
galaxy.web.framework.base DEBUG 2011-05-17 09:59:59,556 Enabling 'request_type' 
controller, class: RequestType
galaxy.web.framework.base DEBUG 2011-05-17 09:59:59,561 Enabling 
'external_service' controller, class: ExternalService
galaxy.web.framework.base DEBUG 2011-05-17 09:59:59,562 Enabling 'library' 
controller, class: Library
galaxy.web.framework.base DEBUG 2011-05-17 09:59:59,746 Enabling 'workflow' 
controller, class: WorkflowController
galaxy.web.framework.base DEBUG 2011-05-17 09:59:59,752 Enabling 
'library_admin' controller, class: LibraryAdmin
galaxy.web.framework.base DEBUG 2011-05-17 09:59:59,756 Enabling 'async' 
controller, class: ASync
galaxy.web.framework.base DEBUG 2011-05-17 10:00:00,026 Enabling 'history' 
controller, class: HistoryController
galaxy.web.framework.base DEBUG 2011-05-17 10:00:00,031 Enabling 'error' 
controller, class: Error
galaxy.web.framework.base DEBUG 2011-05-17 10:00:00,090 Enabling 
'requests_admin' controller, class: RequestsAdmin
galaxy.web.framework.base DEBUG 2011-05-17 10:00:00,095 Enabling 'ucsc_proxy' 
controller, class: UCSCProxy
galaxy.web.framework.base DEBUG 2011-05-17 10:00:00,104 Enabling 'forms' 
controller, class: Forms
galaxy.web.framework.base DEBUG 2011-05-17 10:00:00,137 Enabling 'dataset' 
controller, class: DatasetInterface
galaxy.web.framework.base DEBUG 2011-05-17 10:00:00,143 Enabling 'tool_runner' 
controller, class: ToolRunner
galaxy.web.framework.base DEBUG 2011-05-17 10:00:00,148 Enabling 'tag' 
controller, class: TagsController
galaxy.web.framework.base DEBUG 2011-05-17 10:00:00,158 Enabling 'user' 
controller, class: User
galaxy.web.framework.base DEBUG 2011-05-17 10:00:00,165 Enabling 'root' 
controller, class: RootController
galaxy.web.buildapp DEBUG 2011-05-17 10:00:00,189 Enabling 'httpexceptions' 
middleware
galaxy.web.buildapp DEBUG 2011-05-17 10:00:00,195 Enabling 'recursive' 
middleware
galaxy.web.buildapp DEBUG 2011-05-17 10:00:00,220 Enabling 'print debug' 
middleware
/share/apps/galaxy/shantanu-temp-workspace/galaxy-cluster-test/galaxy-dist-50e249442c5a/eggs/WebError-0.8a-py2.6.egg/weberror/exceptions/serial_number_generator.py:11:
 DeprecationWarning: the md5 module is deprecated; use hashlib instead
  import md5
galaxy.web.buildapp DEBUG 2011-05-17 10:00:00,726 Enabling 'eval exceptions' 
middleware
galaxy.web.buildapp DEBUG 2011-05-17 10:00:00,740 Enabling 'trans logger' 
middleware
galaxy.web.buildapp DEBUG 2011-05-17 10:00:00,740 Enabling 'config' middleware
galaxy.web.buildapp DEBUG 2011-05-17 10:00:00,744 Enabling 'x-forwarded-host' 
middleware
Starting server in PID 10170.
serving on 0.0.0.0:8081 view at http://127.0.0.1:8081

}}}


--
Thanks,
Shantanu.
___
Please keep all replies on the list by using reply all
in your 

Re: [galaxy-dev] Database/Build

2011-05-17 Thread Daniel Blankenberg
I'll just chime in quickly with an agreement that FASTQ files should not have 
dbkeys set. They don't yet belong to a build/reference genome version. Some 
tools/workflows may currently require a FASTQ file to have the dbkey set, but 
this should be considered a work-around for a defect in a tool xml. I'll let 
someone else on the team address Tophat and the suggestions specifically.

Thanks for using Galaxy,

Dan


On May 17, 2011, at 10:57 AM, Dave Walton wrote:

 I'd like to get a better understanding of the point of the database/build
 attribute, and pose the question of when is the appropriate time to have it
 set?
 
 In our case at the Jackson Laboratory, the most common build is  NCBI37/MM9.
 
 However, the feeling of many folks here, is that this should not be set on
 our fastq files.  The only place we really run into trouble is with
 cufflinks.  If you haven't set the db when you get to cufflinks you'll get
 an error.
 
 Our suggestion is that there should be one of two options:
 
 1)  Tophat has the ability to set the database of the output files based on
 the genome that was selected for alignment.
 
 2)  There should be a module that can be plugged into a workflow that would
 set the database of the file prior to passing the file to cufflinks (or any
 other two that requires the database attribute to be set).
 
 We are curious if anyone else is running into this issue, and how it is
 being solved.
 
 We're thinking about hacking the Tophat wrapper, but I wanted to check with
 others before I did this.
 
 Thanks,
 
 Dave
 
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
  http://lists.bx.psu.edu/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Database/Build

2011-05-17 Thread Kelly Vincent

Dave,

The fact that the Tophat wrapper was not setting the genome based on  
the alignment genome was actually a bug, but I just fixed it in  
changeset 5570:0c1251f25c6b.


Let us know if you have further questions.

Regards,
Kelly


On May 17, 2011, at 10:57 AM, Dave Walton wrote:

I'd like to get a better understanding of the point of the database/ 
build
attribute, and pose the question of when is the appropriate time to  
have it

set?

In our case at the Jackson Laboratory, the most common build is   
NCBI37/MM9.


However, the feeling of many folks here, is that this should not be  
set on

our fastq files.  The only place we really run into trouble is with
cufflinks.  If you haven't set the db when you get to cufflinks  
you'll get

an error.

Our suggestion is that there should be one of two options:

1)  Tophat has the ability to set the database of the output files  
based on

the genome that was selected for alignment.

2)  There should be a module that can be plugged into a workflow  
that would
set the database of the file prior to passing the file to cufflinks  
(or any

other two that requires the database attribute to be set).

We are curious if anyone else is running into this issue, and how it  
is

being solved.

We're thinking about hacking the Tophat wrapper, but I wanted to  
check with

others before I did this.

Thanks,

Dave


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


[galaxy-dev] Select first/last N rows from grouped tabular files (e.g. top BLAST hits)

2011-05-17 Thread Peter Cock
Hi all,

I'm wondering if the following task can be done in Galaxy with the
standard tools. The specific example is selecting the top (e.g. 3)
match sequences for each blast query, but I see this problem as much
more general than a  Select top BLAST hits tool.

I want to select the first few (e.g. 1) rows of each group in a
tabular file, where the group criteria is having certain columns equal
(e.g. the first 2).

e.g. Tabular BLAST output has columns of query ID, match ID, etc.

queryA match1 ...
queryA match2 ...
queryA match2 ...
queryA match3 ...
queryA match4 ...
queryA match4 ...
queryA match4 ...
queryB match5 ...
queryB match5 ...
queryC match6 ...
queryC match7 ...

In this example, some of my queries have more than one HSP per match
(more than one line with the same first two columns). If I group on
the first two columns, the groups are:


queryA match1 ...

queryA match2 ...
queryA match2 ...

queryA match3 ...

queryA match4 ...
queryA match4 ...
queryA match4 ...

queryB match5 ...
queryB match5 ...

queryC match6 ...

queryC match7 ...


If I then take the first row in each group, that gives me just the
first HSP for each query+match combination.

queryA match1 ...
queryA match2 ...
queryA match3 ...
queryA match4 ...
queryB match5 ...
queryC match6 ...
queryC match7 ...

If for example I wanted only the top 3 matches for each query, I could
repeat the proposed tool one more time but with different settings -
this time grouping on the first column only:

queryA match1 ...
queryA match2 ...
queryA match3 ...
queryB match5 ...
queryC match6 ...
queryC match7 ...

I hope I've conveyed the idea here. The existing tools Select first
lines from a dataset and Select last lines from a dataset are
related, but do this at the file level.

Does this make sense? Does it seem like a useful tool to write if
there isn't anything like this already present? Or might it be simpler
to just write a Select top BLAST hits tool?

Peter
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] library_upload_from_import_dir

2011-05-17 Thread George, David
Hi,

 

I'm a Galaxy and Python newbie and I'm working on project that desires
to upload files from a directory to the Galaxy server.  We'd like to
physically copy the files rather than maintain references to them.

 

I'm starting by following the examples in scripts/api/README.  All the
display and library_create_library examples work.  However, the
library_upload_from_import_dir fails.  In my universe...ini,
library_import_dir is defined to be /home/dgeorge/galaxy/import and
there is a bed directory under that with a short file, bed1.bed in the
correct format.  I'm not sure what to provide for the last parameter,
the dbkey - maybe that's my problem?

 

  I ran:

 

./library_upload_from_import_dir.py 59d2fd4e020e178f8c48e61150e513c2
http://localhost:8080/api/libraries/f597429621d6eb2b/contents
c6ca0ddb55be603a10e891fff2e902c3 bed bed hg19

 

And I get the error:

 

Exception happened during processing of request from ('127.0.0.1',
38857)

Traceback (most recent call last):

  File
/home/dgeorge/galaxy/galaxy-central/eggs/Paste-1.6-py2.6.egg/paste/http
server.py, line 1053, in process_request_in_thread

self.finish_request(request, client_address)

  File /usr/lib/python2.6/SocketServer.py, line 322, in finish_request

self.RequestHandlerClass(request, client_address, self)

  File /usr/lib/python2.6/SocketServer.py, line 618, in __init__

self.finish()

  File /usr/lib/python2.6/SocketServer.py, line 661, in finish

self.wfile.flush()

  File /usr/lib/python2.6/socket.py, line 297, in flush

self._sock.sendall(buffer(data, write_offset, buffer_size))

error: [Errno 32] Broken pipe

 

Any ideas or suggestions?

 

Thanks!

 

David George

Staff Software Engineer

Illumina, Inc.

25861 Industrial Blvd.

Hayward, CA 94545

Tel:  510-670-9326

Fax:  510-670-9302

Email:  dgeo...@illumina.com mailto:dgeo...@illumina.com 

 

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Reciprocal Best Hits (RBH) from BLAST tabular output

2011-05-17 Thread Kanwei Li
Hi Peter,

I think the tool shed would be appropriate for this...

-K

On Tue, May 17, 2011 at 6:23 AM, Peter Cock p.j.a.c...@googlemail.comwrote:

 On Wed, May 4, 2011 at 9:59 AM, Peter Cock p.j.a.c...@googlemail.com
 wrote:
  Hi all,
 
  I mentioned just over a month ago that I had written a Galaxy tool
  to do find Reciprocal Best Hits (RBH) from BLAST tabular output
  or similar (e.g. Bill Pearson's FASTA tabular output).
 
  http://lists.bx.psu.edu/pipermail/galaxy-dev/2011-March/004799.html
  http://lists.bx.psu.edu/pipermail/galaxy-dev/2011-April/004825.html
  http://lists.bx.psu.edu/pipermail/galaxy-dev/2011-April/004829.html
 
  The only problem I had was setting default columns, which I
  consider to be a minor bug in Galaxy:
 
  https://bitbucket.org/galaxy/galaxy-central/issue/507
 
  Regardless of this small usability issue, would the Galaxy team
  like to add this tool to the main distribution (to sit along with my
  BLAST+ wrappers), or should I submit it to the tool shed?
 
  If you want to merge it, the following change set should suffice:
  https://bitbucket.org/peterjc/galaxy-central/changeset/198bf927ca30
 
  I made a subsequent change set to test giving column defaults
  as value=c2 rather than value=c2 which makes no difference,
  but if this is how default column values are meant to be given,
  take this change too:
  https://bitbucket.org/peterjc/galaxy-central/changeset/198bf927ca30

 A third commit fixes a typo in the help:
 https://bitbucket.org/peterjc/galaxy-central/changeset/2be6c0bc76ea

 Would the Galaxy team like to merge this tool into the trunk
 (and if so, do you have any suggestions for improvements or
 changes), or should I put it up on the Galaxy Tool Shed instead?

 Thanks,

 Peter

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/