[galaxy-dev] Uploading and outputting multiple files

2012-10-09 Thread Alex.Khassapov
Hi List,

Our tools require an array of input files and also produce an array of files.  
This seems as a pretty standard task, but unfortunately Galaxy doesn't support 
this yet.

I wonder if somebody has already implemented this kind of 'file array' data 
type?

I.e. the user selects a local directory and uploads all files as a single 
dataset. Even better if the user could specify a regular expression for file 
selection.

I know that there are other options, i.e. taring the files or using composite 
data type, but they don't seem natural.

Any hints?
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] pass more information on a dataset merge

2012-10-15 Thread Alex.Khassapov
Hi John,

I tried your galaxy-central-homogeneous-composite-datatypes implementation, 
works great thank you (and Jorrit).

A couple of fixes:

1. Add multi_upload.xml to too_conf.xml
2. lib/galaxy/tools/parameters/grouping.py line 322 (in get_filenames( context 
)) - 
if ftp_files is not None:
   Remove is not None as ftp_files is empty [], but not None, then line 331 
user_ftp_dir = os.path.join( trans.app.config.ftp_upload_dir, trans.user.email 
) throws an exeption if ftp_upload_dir isn't set.

Alex

-Original Message-
From: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of John Chilton
Sent: Tuesday, 16 October 2012 1:07 AM
To: Jorrit Boekel
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] pass more information on a dataset merge

Here is an implementation of the implicit multi-file composite datatypes piece 
of that idea. I think the implicit parallelism may be harder.

https://bitbucket.org/galaxyp/galaxy-central-homogeneous-composite-datatypes/compare

Jorrit do you have any objection to me trying to get this included in 
galaxy-central (this is 95% code I stole from you)? I made the changes against 
a clean galaxy-central fork and included nothing proteomics specific in 
anticipation of trying to do that. I have talked with Jim Johnson about the 
idea and he believes it would be useful his mothur metagenomics tools, so the 
idea is valuable outside of proteomics.

Galaxy team, would you be okay with including this and if so is there anything 
you would like to see either at a high level or at the level of the actual 
implementation.

-John


John Chilton
Senior Software Developer
University of Minnesota Supercomputing Institute
Office: 612-625-0917
Cell: 612-226-9223
Bitbucket: https://bitbucket.org/jmchilton
Github: https://github.com/jmchilton
Web: http://jmchilton.net

On Mon, Oct 8, 2012 at 9:24 AM, John Chilton chil...@msi.umn.edu wrote:
 Jim Johnson and I have been discussing that approach to handling 
 fractionated proteomics samples as well (composite datatypes, not the 
 specifics of the interface for parallelizing).

 My perspective has been that Galaxy should be augmented with better 
 native mechanisms for grouping objects in histories, operating over 
 those groups, building workflows that involve arbitrary numbers of 
 inputs, etc... Composite data types are kindof a kludge, I think they 
 are more useful for grouping HTML files together when you don't care 
 about operating on the constituent parts you just want to view pages a 
 as a report or something. With this proteomic data we are working 
 with, the individual pieces are really interesting right? You want to 
 operate on the individual pieces with the full array of tools (not 
 just these special tools that have the logic for dealing with the 
 composite datatypes), you want to visualize the files, etc... Putting 
 these component pieces in the composite data type extra_files path 
 really limits what you can do with the pieces in Galaxy.

 I have a vague idea of something that I think could bridge some of the 
 gaps between the approaches (though I have no clue on the 
 feasibility). Looking through your implementation on bitbucket it 
 looks like you are defining your core datatypes (MS2, CruxSequest) as 
 subclasses of this composite data type (CompositeMultifile). My 
 recommendation would be to try to define plain datatypes for these 
 core datatype (MS2, CruxSequest) and then have the separate composite 
 datatype sort of delegate to the plain datatypes.

 You could then continue to explicitly declare subclasses of the 
 composite datatype (maybe MS2Set, CruxSequestSet), but also maybe 
 augement the tool xml so you can do implicit data type instances the 
 way you can with tabular data for instance (instead of defining 
 columns you would define the datatype to delegate to).

 The next step would be to make the parallelism implicit (i.e pull it 
 out of the tool wrapper). Your tool wrappers wouldn't reference the 
 composite datatypes, they would reference the simple datatypes, but 
 you could add a little icon next to any input that let you replace a 
 single input with a composite input for that type. It would be kind of 
 like the run workflow page where you can replace an input with a 
 multiple inputs. If a composite input (or inputs) are selected the 
 tool would then produce composite outputs.

 For the steps that actually combine multiple inputs, I think in your 
 case this is perculator maybe (a tool like interprophet or Scaffold 
 that merges peptide probabilities across runs and groups proteins), 
 then you could have the same sort of implicit replacement but instead 
 of for single inputs it could do that for multi-inputs (assuming the 
 Galaxy powers that be accept my fixes for multi-input tool parameters:
 

Re: [galaxy-dev] Number of outputs = number of inputs

2012-10-16 Thread Alex.Khassapov
I tried galaxy-central-homogeneous-composite-datatypes fork, works great. I 
have a similar problem, where number of output files varies, it seems that your 
approach might work for output files as well (not only input). Currently I'm 
trying to work out how to implement it, any help is appreciated.

Alex

-Original Message-
From: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of John Chilton
Sent: Wednesday, 17 October 2012 12:49 AM
To: Sascha Kastens
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] Number of outputs = number of inputs

I don't believe this is possible in Galaxy right now. Are the outputs 
independent or is information from all inputs used to produce all outputs? If 
they are independent, you can create a workflow containing just your tool with 
1 input and 1 output and use the batch workflow mode to run it on multiple 
files and get multiple outputs. This is not a beautiful solution but it gets 
the job done in some cases.

Another thing to look at might be the discussion we are having on the thread 
pass more information on a dataset merge. We have a fork (its all work from 
Jorrit Boekel) of galaxy that creates composite datatypes for each explicitly 
defined type that can hold collections of a single type.

https://bitbucket.org/galaxyp/galaxy-central-homogeneous-composite-datatypes/compare

This would hopefully let you declare that you can accept a collection of 
whatever your input type is and produce a collection of whatever your output 
is. Lots of downsides to this approach - not fully implemented, and not 
included in Galaxy proper, your outputs would be wrapped up in a composite 
datatype so they wouldn't be easily processable by downstream tools. It would 
be good to have additional people hacking on it though :)

-John


John Chilton
Senior Software Developer
University of Minnesota Supercomputing Institute
Office: 612-625-0917
Cell: 612-226-9223
Bitbucket: https://bitbucket.org/jmchilton
Github: https://github.com/jmchilton
Web: http://jmchilton.net

On Tue, Oct 16, 2012 at 7:13 AM, Sascha Kastens s.kast...@gatc-biotech.com 
wrote:
 Hi all!



 I have a tool which takes one ore more input files. For each input 
 file one output is created,

 i.e. 1 input file - 1 output file, 2 input files - 2 output files, etc.



 What is the best way to handle this? I used the directions for handlin 
 multiple output files where

 the 'Number of Output datasets cannot be determined until tool run' 
 which in my opinion is a bit

 inappropriate. BTW: The input files are added via the repeat-Tag, so 
 maybe there is a similar

 thing for outputs?



 Thanks in advance!



 Cheers,

 Sascha


 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this and other 
 Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this and other Galaxy 
lists, please use the interface at:

  http://lists.bx.psu.edu/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Retreiving a library data type in command

2012-10-19 Thread Alex.Khassapov
$filename.ext

From: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Simon Gladman
Sent: Friday, 19 October 2012 4:38 PM
To: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] Retreiving a library data type in command

oops, clicked send too early..
On 19 October 2012 16:33, Simon Gladman 
simon.glad...@monash.edumailto:simon.glad...@monash.edu wrote:
Hi all,

Have searched mailing list to no avail. Therefore:

How can I get a data library's data_type in the command tagset?

Example:

command
dosomething.plhttp://dosomething.pl
filename.data_type
   /command
   inputs
  param type='data' format='fasta,fastq,sam,bam' name='filename' label='a 
data set from imported datasets...'/
/inputs

So is the file of type: fasta, fastq, sam or bam and how can I get that into 
the command tag?

Thanks in advance,

Simon Gladman.

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] pass more information on a dataset merge

2012-10-19 Thread Alex.Khassapov
Hi John,

what I don't get - I specify the output format m:grd, my tool generates 
multiple output files in dataset_id_files folder, but the dataset_id.dat file 
is empty. I need to call this regenerate_primary_file() to add the HTML with 
the file list to the dat file. But I'm not sure where?

-Alex

-Original Message-
From: jmchil...@gmail.com [mailto:jmchil...@gmail.com] On Behalf Of John Chilton
Sent: Friday, 19 October 2012 6:16 AM
To: Khassapov, Alex (CSIRO IMT, Clayton)
Subject: Re: [galaxy-dev] pass more information on a dataset merge

On Tue, Oct 16, 2012 at 11:11 PM,  alex.khassa...@csiro.au wrote:
 Hi John,

 I am definitely interested in this idea, not only me - we are currently 
 working on moving a few scientific tools (not related to genome) into cloud 
 using Galaxy.

Great. My interests in Galaxy are mostly outside of genomics as well, it is 
good to have more people utilizing Galaxy in this way because it will force the 
platform to become more generic and address more broader use cases.


 We will try it further and see if we need any changes. For now one 
 improvement would be nice, make dataset_id.dat contain list of paths to the 
 location of the uploaded files, so by displaying html page the user could 
 just click on the link and download the file.


Code that attempted to do this was in there, but didn't work obviously. I have 
now fixed it up.

Thanks for beta testing.

-John

 We are pretty new to Galaxy, so our understanding of Galaxy is pretty limited.

 Thanks again,

 Alex


 -Original Message-
 From: jmchil...@gmail.com [mailto:jmchil...@gmail.com] On Behalf Of 
 John Chilton
 Sent: Wednesday, 17 October 2012 3:21 AM
 To: Khassapov, Alex (CSIRO IMT, Clayton)
 Subject: Re: [galaxy-dev] pass more information on a dataset merge

 Wow, thanks for the rapid feedback! I have made the changes you have 
 suggested. It seems you must be interested in this idea/implementation. Let 
 me know if you have specific use cases/requirements in mind and/or if you 
 would be interested in write access to the repository.

 -John

 On Mon, Oct 15, 2012 at 11:51 PM,  alex.khassa...@csiro.au wrote:
 Hi John,

 I tried your galaxy-central-homogeneous-composite-datatypes implementation, 
 works great thank you (and Jorrit).

 A couple of fixes:

 1. Add multi_upload.xml to too_conf.xml 2.
 lib/galaxy/tools/parameters/grouping.py line 322 (in get_filenames( context 
 )) -
 if ftp_files is not None:
Remove is not None as ftp_files is empty [], but not None, then line 
 331 user_ftp_dir = os.path.join( trans.app.config.ftp_upload_dir, 
 trans.user.email ) throws an exeption if ftp_upload_dir isn't set.

 Alex

 -Original Message-
 From: galaxy-dev-boun...@lists.bx.psu.edu
 [mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of John 
 Chilton
 Sent: Tuesday, 16 October 2012 1:07 AM
 To: Jorrit Boekel
 Cc: galaxy-dev@lists.bx.psu.edu
 Subject: Re: [galaxy-dev] pass more information on a dataset merge

 Here is an implementation of the implicit multi-file composite datatypes 
 piece of that idea. I think the implicit parallelism may be harder.

 https://bitbucket.org/galaxyp/galaxy-central-homogeneous-composite-da
 t
 atypes/compare

 Jorrit do you have any objection to me trying to get this included in 
 galaxy-central (this is 95% code I stole from you)? I made the changes 
 against a clean galaxy-central fork and included nothing proteomics specific 
 in anticipation of trying to do that. I have talked with Jim Johnson about 
 the idea and he believes it would be useful his mothur metagenomics tools, 
 so the idea is valuable outside of proteomics.

 Galaxy team, would you be okay with including this and if so is there 
 anything you would like to see either at a high level or at the level of the 
 actual implementation.

 -John

 
 John Chilton
 Senior Software Developer
 University of Minnesota Supercomputing Institute
 Office: 612-625-0917
 Cell: 612-226-9223
 Bitbucket: https://bitbucket.org/jmchilton
 Github: https://github.com/jmchilton
 Web: http://jmchilton.net

 On Mon, Oct 8, 2012 at 9:24 AM, John Chilton chil...@msi.umn.edu wrote:
 Jim Johnson and I have been discussing that approach to handling 
 fractionated proteomics samples as well (composite datatypes, not 
 the specifics of the interface for parallelizing).

 My perspective has been that Galaxy should be augmented with better 
 native mechanisms for grouping objects in histories, operating over 
 those groups, building workflows that involve arbitrary numbers of 
 inputs, etc... Composite data types are kindof a kludge, I think 
 they are more useful for grouping HTML files together when you don't 
 care about operating on the constituent parts you just want to view 
 pages a as a report or something. With this proteomic data we are 
 working with, the individual pieces are really interesting right? 
 You want to operate on the individual pieces 

Re: [galaxy-dev] pass more information on a dataset merge

2012-10-22 Thread Alex.Khassapov
1) One more question, 

My colleague likes the idea, but his composite data set dataset_id.dat file 
contains only a plain list of uploaded files, not HTML like yours.

I was wondering if it is possible to pass somehow a parameter to 
CompositeMultifile.regenerate_primary_file(dataset) to switch between HTML and 
plain list formats. I mean add a 'hidden' parameter in toll.xml file, but I'm 
not sure how to get these tool parameters in Galaxy source?

2) And one more question, I use your m:xxx format for the tool output, all 
files are generated in the dataset_id_files folder, but the dataset_id.dat file 
is empty. To force creation of the dataset.id file I use the exec_after_process 
hook (code file=.py/ tag):
for key,val in out_data.items():
try:
if not hasattr(val.dataset, name):
val.dataset.name = val.dataset.file_name
val.datatype.regenerate_primary_file(val.dataset)
except Exception as e:
print  ERROR:  + str(e)

But it doesn't feel right, I wonder what is the proper way to use m:xxx 
format for the output?

-Alex

-Original Message-
From: jmchil...@gmail.com [mailto:jmchil...@gmail.com] On Behalf Of John Chilton
Sent: Saturday, 20 October 2012 1:40 AM
To: Khassapov, Alex (CSIRO IMT, Clayton)
Subject: Re: [galaxy-dev] pass more information on a dataset merge

Hey Alex,

I think the idea here is that your initially uploaded files would have 
different names, but after Jorrit's tool split/merge step they will all just be 
named after the dataset id (see screenshot) so you need the task_X at the end 
so they don't all just have the same name.

I have not thought a whole lot about the naming thing, in general it seems like 
a tough problem and one that Galaxy itself doesn't do a particularly good job 
at.

Jorrit have you given any thought to this?

I wonder if it would be feasible to use the initial uploaded name as a sort of 
prefix going forward. So if I upload say

fraction1.RAW
fraction2.RAW
fraction3.RAW

and run a conversion step, maybe I could get:

fraction1_dataset567.ms2
fraction2_dataset567.ms2
fraction3_dataset567.ms2

instead of

dataset567.dat_task_0
dataset567.dat_task_1
dataset567.dat_task_2

Jorrit do you mind if I give implementing that a shot? It seems like it would 
be a win to me. Am I am going to hit some problem I don't see now (presumable 
we have to send some data from the split to the merge and that might be tricky)?

-John

On Thu, Oct 18, 2012 at 7:00 PM,  alex.khassa...@csiro.au wrote:
 Thanks John,

 I wonder what's the reason for appending _task_XX to the file names, why 
 can't we just keep original file names?

 Alex

 -Original Message-
 From: jmchil...@gmail.com [mailto:jmchil...@gmail.com] On Behalf Of 
 John Chilton
 Sent: Friday, 19 October 2012 6:16 AM
 To: Khassapov, Alex (CSIRO IMT, Clayton)
 Subject: Re: [galaxy-dev] pass more information on a dataset merge

 On Tue, Oct 16, 2012 at 11:11 PM,  alex.khassa...@csiro.au wrote:
 Hi John,

 I am definitely interested in this idea, not only me - we are currently 
 working on moving a few scientific tools (not related to genome) into cloud 
 using Galaxy.

 Great. My interests in Galaxy are mostly outside of genomics as well, it is 
 good to have more people utilizing Galaxy in this way because it will force 
 the platform to become more generic and address more broader use cases.


 We will try it further and see if we need any changes. For now one 
 improvement would be nice, make dataset_id.dat contain list of paths to the 
 location of the uploaded files, so by displaying html page the user could 
 just click on the link and download the file.


 Code that attempted to do this was in there, but didn't work obviously. I 
 have now fixed it up.

 Thanks for beta testing.

 -John

 We are pretty new to Galaxy, so our understanding of Galaxy is pretty 
 limited.

 Thanks again,

 Alex


 -Original Message-
 From: jmchil...@gmail.com [mailto:jmchil...@gmail.com] On Behalf Of 
 John Chilton
 Sent: Wednesday, 17 October 2012 3:21 AM
 To: Khassapov, Alex (CSIRO IMT, Clayton)
 Subject: Re: [galaxy-dev] pass more information on a dataset merge

 Wow, thanks for the rapid feedback! I have made the changes you have 
 suggested. It seems you must be interested in this idea/implementation. Let 
 me know if you have specific use cases/requirements in mind and/or if you 
 would be interested in write access to the repository.

 -John

 On Mon, Oct 15, 2012 at 11:51 PM,  alex.khassa...@csiro.au wrote:
 Hi John,

 I tried your galaxy-central-homogeneous-composite-datatypes implementation, 
 works great thank you (and Jorrit).

 A couple of fixes:

 1. Add multi_upload.xml to too_conf.xml 2.
 lib/galaxy/tools/parameters/grouping.py line 322 (in get_filenames( context 
 )) -
 if ftp_files is not None:
Remove is not None as ftp_files is empty [], but not None, then line 
 331 user_ftp_dir = os.path.join( 

Re: [galaxy-dev] determination of errors

2012-10-22 Thread Alex.Khassapov
By default Galaxy checks stderr, if it's not empty - returns an error.  So if 
your tool doesn't fail (returns 0) but you print something to  stderr , your 
tool will still fail in Galaxy.  There's  stderr_wrapper.py workaround for 
that. 

On the other hand, if you tool returns non zero but doesn't use stderr -- 
Galaxy ignores tools return value. 

There are two ways around that: 

1.   Galaxy has exit_code tag to specify which exit codes to handle 
http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax#A.3Cexit_code.3E_tag_set

So in my tool.xml I have:
stdio
   exit_code range=1:255   level=fatal   description=XLICTRecon.exe 
Exception /
/stdio

2.   Simple workaround in the Python wrapper, print something to stderr if 
the tool returns an error:

returncode = subprocess.call(cmd) 
if(returncode):
sys.stderr.write('Error: returned ' + str(returncode))

-Alex

-Original Message-
From: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Peter Cock
Sent: Tuesday, 23 October 2012 2:30 AM
To: David Hoover
Cc: Galaxy Dev
Subject: Re: [galaxy-dev] determination of errors

On Mon, Oct 22, 2012 at 4:23 PM, David Hoover hoove...@helix.nih.gov wrote:
 How does Galaxy determine that a job has failed?

It now depends on the individual tool's XML file.

 Does it simply see if the STDERR is empty?

Why default, yes. The tool's XML can specify particular regexs to look for, or 
to decide based on the return code - but for the time being most of the tools 
still just look at stderr. See:
http://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax

 What happens if an application normally outputs to STDERR?

Either use the new functionality in the XML definition, or what older Galaxy 
tools did was a wrapper script to hide/redirect stderr to avoid false positives.

 This is a problem for our local installation, as I have enabled it to 
 run as the local user on the backend cluster.  If a user has an error 
 in the .bashrc file, it will automatically write to STDERR, and all 
 jobs, no matter what, are labelled as failing.

In which case the user should see those errors and be able to do something 
about it, right?

Peter
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this and other Galaxy 
lists, please use the interface at:

  http://lists.bx.psu.edu/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Source code documentation

2012-10-22 Thread Alex.Khassapov
API seems a bit of overkill, as I understand, it's useful for 'external' access 
via http.  My tools run inside Galaxy and I should be able to use Python code 
directly.

From: Anthonius deBoer [mailto:thondeb...@me.com]
Sent: Tuesday, 23 October 2012 12:16 PM
To: Khassapov, Alex (CSIRO IMT, Clayton)
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] Source code documentation

The API allows you to do some of that...
If you pass it the ID of the object (input.id) you can do all kinds of requests 
with the API.

Look in the scripts/api folder of your local Galaxy instance...

NOTE: The API seems to be a bit of a stepchild, since there is no good 
documentation and it seems to be undeveloped to some extent. For instance, the 
biggest issues is that you cannot pass a workflow any parameters, only inputs 
and outputs...

So caveat emptor!

Regards,

Thon de Boer, Ph.D.
Bioinformatics Guru
+1-650-799-6839
thondeb...@me.commailto:thondeb...@me.com
LinkedIn Profilehttp://www.linkedin.com/pub/thon-de-boer/1/1ba/a5b




On Oct 22, 2012, at 5:26 PM, 
alex.khassa...@csiro.aumailto:alex.khassa...@csiro.au wrote:


Hi, I wonder if there's some kind of documentation (reference) for the Galaxy 
source?

At the moment I have a couple of questions for example.

1. How can I get the dataset object (in my Python wrapper) given the dataset 
name?
2. How can I access the job parameters (enered in the UI or 'hidden') in the 
Python code?

In general, when I have this kind of questions, where do I look?

-Alex
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Validation / validator questions

2012-10-23 Thread Alex.Khassapov
This can be done in validate(), see 
http://wiki.g2.bx.psu.edu/Admin/Tools/Custom%20Code

Alex

From: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Lukasse, Pieter
Sent: Tuesday, 23 October 2012 9:05 PM
To: galaxy-dev@lists.bx.psu.edu
Subject: [galaxy-dev] Validation / validator questions

Hi,

Is it possible to do validation of input fields depending on the value of OTHER 
input fields?

Example 1

If field A is filled, then field B also should be filled.

Example 2

Field B should be  field A

etc

Pieter Lukasse
Wageningen UR, Plant Research International
Departments of Bioscience and Bioinformatics
Wageningen Campus, Building 107, Droevendaalsesteeg 1, 6708 PB,
Wageningen, the Netherlands
+31-317480891; skype: pieter.lukasse.wur
http://www.pri.wur.nlhttp://www.pri.wur.nl/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Fwd: pass more information on a dataset merge

2012-11-01 Thread Alex.Khassapov
Hi John,

Do you think it's possible to create a test for your 'm: format? I couldn't 
find how to specify multi input files for the test.

-Alex

-Original Message-
From: jmchil...@gmail.com [mailto:jmchil...@gmail.com] On Behalf Of John Chilton
Sent: Tuesday, 23 October 2012 7:59 AM
To: Jorrit Boekel
Cc: Khassapov, Alex (CSIRO IMT, Clayton)
Subject: Re: Fwd: [galaxy-dev] pass more information on a dataset merge

Hello again Jorrit,

Great, I am glad we are largely on the same page here. I don't know when I will 
get a chance to look at this particular aspect, if you get there first that 
will be great, if not I will get there eventually.

-John

On Mon, Oct 22, 2012 at 2:51 AM, Jorrit Boekel jorrit.boe...@scilifelab.se 
wrote:
 IIRC, I implemented the task_X suffix (galaxy does so as well but to
 the split subdirectories) to ensure jobs that contained multiple split
 datasets would be run in sync. Files from two datasets that belong
 together then get analysed together in subsequent steps.

 It would however be much nicer to retain original file names through a
 pipeline or at least the possibility to retrieve them. Since the
 split/merge now run actively look and match for files with identical
 'task_x', it may be an option to do:

 fraction1.raw - fraction1.raw_dataset_43.dat_task_0 -
 fraction1.raw_dataset_44.dat_task_0
 fraction2.raw - fraction2.raw_dataset_43.dat_task_1 -
 fraction2.raw_dataset_44.dat_task_1

 (Note that python starts counting at 0, while most researchers number
 their first fraction 1.)

 I wouldn't mind looking more into that as well, since it would be a
 big improvement UI-wise.

 cheers,
 jorrit






 On 10/19/2012 04:40 PM, John Chilton wrote:

 Jorrit I meant to cc you on this response to Alex.

 -- Forwarded message --
 From: John Chilton chil0...@umn.edu
 Date: Fri, Oct 19, 2012 at 9:40 AM
 Subject: Re: [galaxy-dev] pass more information on a dataset merge
 To: alex.khassa...@csiro.au


 Hey Alex,

 I think the idea here is that your initially uploaded files would
 have different names, but after Jorrit's tool split/merge step they
 will all just be named after the dataset id (see screenshot) so you
 need the task_X at the end so they don't all just have the same name.

 I have not thought a whole lot about the naming thing, in general it
 seems like a tough problem and one that Galaxy itself doesn't do a
 particularly good job at.

 Jorrit have you given any thought to this?

 I wonder if it would be feasible to use the initial uploaded name as
 a sort of prefix going forward. So if I upload say

 fraction1.RAW
 fraction2.RAW
 fraction3.RAW

 and run a conversion step, maybe I could get:

 fraction1_dataset567.ms2
 fraction2_dataset567.ms2
 fraction3_dataset567.ms2

 instead of

 dataset567.dat_task_0
 dataset567.dat_task_1
 dataset567.dat_task_2

 Jorrit do you mind if I give implementing that a shot? It seems like
 it would be a win to me. Am I am going to hit some problem I don't
 see now (presumable we have to send some data from the split to the
 merge and that might be tricky)?

 -John

 On Thu, Oct 18, 2012 at 7:00 PM,  alex.khassa...@csiro.au wrote:

 Thanks John,

 I wonder what's the reason for appending _task_XX to the file names,
 why can't we just keep original file names?

 Alex

 -Original Message-
 From: jmchil...@gmail.com [mailto:jmchil...@gmail.com] On Behalf Of
 John Chilton
 Sent: Friday, 19 October 2012 6:16 AM
 To: Khassapov, Alex (CSIRO IMT, Clayton)
 Subject: Re: [galaxy-dev] pass more information on a dataset merge

 On Tue, Oct 16, 2012 at 11:11 PM,  alex.khassa...@csiro.au wrote:

 Hi John,

 I am definitely interested in this idea, not only me - we are
 currently working on moving a few scientific tools (not related to
 genome) into cloud using Galaxy.

 Great. My interests in Galaxy are mostly outside of genomics as
 well, it is good to have more people utilizing Galaxy in this way
 because it will force the platform to become more generic and
 address more broader use cases.

 We will try it further and see if we need any changes. For now one
 improvement would be nice, make dataset_id.dat contain list of
 paths to the location of the uploaded files, so by displaying html
 page the user could just click on the link and download the file.

 Code that attempted to do this was in there, but didn't work
 obviously. I have now fixed it up.

 Thanks for beta testing.

 -John

 We are pretty new to Galaxy, so our understanding of Galaxy is
 pretty limited.

 Thanks again,

 Alex


 -Original Message-
 From: jmchil...@gmail.com [mailto:jmchil...@gmail.com] On Behalf Of
 John Chilton
 Sent: Wednesday, 17 October 2012 3:21 AM
 To: Khassapov, Alex (CSIRO IMT, Clayton)
 Subject: Re: [galaxy-dev] pass more information on a dataset merge

 Wow, thanks for the rapid feedback! I have made the changes you
 have suggested. It seems you must be interested in this
 idea/implementation. Let me know if you have specific use

Re: [galaxy-dev] pass more information on a dataset merge

2012-12-02 Thread Alex.Khassapov
Hi John,

My colleague (Neil) has a bit of a problem with the multi file support:

When I try and use the option Upload Directory of files I get the error below

Error Traceback:
View as:   Interactive  |  Text  |  XML (full)
⇝ AttributeError: 'Bunch' object has no attribute 'multifiles'
URL: http://140.253.78.218/library_common/upload_library_dataset
Module weberror.evalexception.middleware:364 in respond view
  app_iter = self.application(environ, detect_start_response)
Module paste.debug.prints:98 in __call__ view
  environ, self.app)
Module paste.wsgilib:539 in intercept_output view
  app_iter = application(environ, replacement_start_response)
Module paste.recursive:80 in __call__ view
  return self.application(environ, start_response)
Module paste.httpexceptions:632 in __call__ view
  return self.application(environ, start_response)
Module galaxy.web.framework.base:160 in __call__ view
  body = method( trans, **kwargs )
Module galaxy.web.controllers.library_common:855 in upload_library_dataset  
   view
  **kwd )
Module galaxy.web.controllers.library_common:1055 in upload_dataset view
  json_file_path = upload_common.create_paramfile( trans, uploaded_datasets )
Module galaxy.tools.actions.upload_common:342 in create_paramfile view
  multifiles = uploaded_dataset.multifiles,
AttributeError: 'Bunch' object has no attribute 'multifiles'

Any ideas? Should we check if 'multifiles' attribute is set? Or some other call 
is missing which should set it to NULL if it's missing?

-Alex

-Original Message-
From: jmchil...@gmail.com [mailto:jmchil...@gmail.com] On Behalf Of John Chilton
Sent: Wednesday, 17 October 2012 3:21 AM
To: Khassapov, Alex (CSIRO IMT, Clayton)
Subject: Re: [galaxy-dev] pass more information on a dataset merge

Wow, thanks for the rapid feedback! I have made the changes you have suggested. 
It seems you must be interested in this idea/implementation. Let me know if you 
have specific use cases/requirements in mind and/or if you would be interested 
in write access to the repository.

-John

On Mon, Oct 15, 2012 at 11:51 PM,  alex.khassa...@csiro.au wrote:
 Hi John,

 I tried your galaxy-central-homogeneous-composite-datatypes implementation, 
 works great thank you (and Jorrit).

 A couple of fixes:

 1. Add multi_upload.xml to too_conf.xml 2. 
 lib/galaxy/tools/parameters/grouping.py line 322 (in get_filenames( context 
 )) -
 if ftp_files is not None:
Remove is not None as ftp_files is empty [], but not None, then line 331 
 user_ftp_dir = os.path.join( trans.app.config.ftp_upload_dir, 
 trans.user.email ) throws an exeption if ftp_upload_dir isn't set.

 Alex

 -Original Message-
 From: galaxy-dev-boun...@lists.bx.psu.edu
 [mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of John Chilton
 Sent: Tuesday, 16 October 2012 1:07 AM
 To: Jorrit Boekel
 Cc: galaxy-dev@lists.bx.psu.edu
 Subject: Re: [galaxy-dev] pass more information on a dataset merge

 Here is an implementation of the implicit multi-file composite datatypes 
 piece of that idea. I think the implicit parallelism may be harder.

 https://bitbucket.org/galaxyp/galaxy-central-homogeneous-composite-dat
 atypes/compare

 Jorrit do you have any objection to me trying to get this included in 
 galaxy-central (this is 95% code I stole from you)? I made the changes 
 against a clean galaxy-central fork and included nothing proteomics specific 
 in anticipation of trying to do that. I have talked with Jim Johnson about 
 the idea and he believes it would be useful his mothur metagenomics tools, so 
 the idea is valuable outside of proteomics.

 Galaxy team, would you be okay with including this and if so is there 
 anything you would like to see either at a high level or at the level of the 
 actual implementation.

 -John

 
 John Chilton
 Senior Software Developer
 University of Minnesota Supercomputing Institute
 Office: 612-625-0917
 Cell: 612-226-9223
 Bitbucket: https://bitbucket.org/jmchilton
 Github: https://github.com/jmchilton
 Web: http://jmchilton.net

 On Mon, Oct 8, 2012 at 9:24 AM, John Chilton chil...@msi.umn.edu wrote:
 Jim Johnson and I have been discussing that approach to handling 
 fractionated proteomics samples as well (composite datatypes, not the 
 specifics of the interface for parallelizing).

 My perspective has been that Galaxy should be augmented with better 
 native mechanisms for grouping objects in histories, operating over 
 those groups, building workflows that involve arbitrary numbers of 
 inputs, etc... Composite data types are kindof a kludge, I think they 
 are more useful for grouping HTML files together when you don't care 
 about operating on the constituent parts you just want to view pages 
 a as a report or something. With this proteomic data we are working 
 with, the individual pieces are really interesting right? You want to 
 operate on the 

[galaxy-dev] FW: pass more information on a dataset merge

2012-12-03 Thread Alex.Khassapov
Thanks John, works fine.
-Alex

-Original Message-
From: Burdett, Neil (ICT Centre, Herston - RBWH) 
Sent: Tuesday, 4 December 2012 9:57 AM
To: Khassapov, Alex (CSIRO IMT, Clayton)
Cc: Szul, Piotr (ICT Centre, Marsfield)
Subject: RE: [galaxy-dev] pass more information on a dataset merge

Thanks Alex,
seems to work now so I checked in the code to our repository

Neil

From: jmchil...@gmail.com [jmchil...@gmail.com] On Behalf Of John Chilton 
[chil0...@umn.edu]
Sent: Tuesday, December 04, 2012 4:26 AM
To: Khassapov, Alex (CSIRO IMT, Clayton)
Cc: Burdett, Neil (ICT Centre, Herston - RBWH); Szul, Piotr (ICT Centre, 
Marsfield); galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] pass more information on a dataset merge

Hey Alex,

Until I have bullied this stuff into galaxy-central, you should probably e-mail 
me directly and not the dev list. That said thanks for the heads up, that there 
was a definitely a bug. I pushed out this changeset to the bitbucket repository:

https://bitbucket.org/galaxyp/galaxy-central-homogeneous-composite-datatypes/commits/d501e8a2e3fafca139f1187ee947ae425a75eb2c/raw/

I should mention that I have sort of abandoned the bitbucket repository for 
this work in lieu of github, so that I can rebase as Galaxy changes and keep 
clean changesets.

https://github.com/jmchilton/galaxy-central/tree/multifiles

Since I am posting this on the mailing list I might as well post a little 
summary of what has been done:

- For each datatype, an implicit multiple file version of that datatype is 
created. A new multiple upload tool/ftp directory tool has been implemented to 
create these.
- For any simple tool input you can chose a multiple file version of that input 
instead and then all outputs will become multiple file versions of the outputs. 
Uses task splitting stuff to distribute jobs across files.
- For multiple input tools, you can choose either multiple inputs individuals 
(no change there) or a single composite version.
Consistent interface for file path, display name, extension, etc... in tool 
wrapper.
- It should work with most existing tools and datatypes without change.
- Everything enabled with a single option in universe.ini

Upshots:
  - Makes workflows with arbitrary merging (and to a lesser extent
branching) and arbitrary number of input files possible.
  - Original base name is saved throughout analysis (when possible), so 
sample/replicate/fraction/lane/etc tracking is easier.

I started working on the metadata piece last night, once that is done I was 
planning on making a little demo video to post to this list to try to sell the 
3 outstanding small pull requests related to this work and the massive one that 
would follow those up :).

-John


On Sun, Dec 2, 2012 at 8:52 PM,  alex.khassa...@csiro.au wrote:
 Hi John,

 My colleague (Neil) has a bit of a problem with the multi file support:

 When I try and use the option Upload Directory of files I get the 
 error below

 Error Traceback:
 View as:   Interactive  |  Text  |  XML (full)
 ⇝ AttributeError: 'Bunch' object has no attribute 'multifiles'
 URL: http://140.253.78.218/library_common/upload_library_dataset
 Module weberror.evalexception.middleware:364 in respond view
  app_iter = self.application(environ, detect_start_response)
 Module paste.debug.prints:98 in __call__ view
  environ, self.app)
 Module paste.wsgilib:539 in intercept_output view
  app_iter = application(environ, replacement_start_response)
 Module paste.recursive:80 in __call__ view
  return self.application(environ, start_response)
 Module paste.httpexceptions:632 in __call__ view
  return self.application(environ, start_response)
 Module galaxy.web.framework.base:160 in __call__ view
  body = method( trans, **kwargs )
 Module galaxy.web.controllers.library_common:855 in upload_library_dataset
  view
  **kwd )
 Module galaxy.web.controllers.library_common:1055 in upload_dataset 
 view
  json_file_path = upload_common.create_paramfile( trans, 
 uploaded_datasets )
 Module galaxy.tools.actions.upload_common:342 in create_paramfile view
  multifiles = uploaded_dataset.multifiles,
 AttributeError: 'Bunch' object has no attribute 'multifiles'

 Any ideas? Should we check if 'multifiles' attribute is set? Or some other 
 call is missing which should set it to NULL if it's missing?

 -Alex

 -Original Message-
 From: jmchil...@gmail.com [mailto:jmchil...@gmail.com] On Behalf Of 
 John Chilton
 Sent: Wednesday, 17 October 2012 3:21 AM
 To: Khassapov, Alex (CSIRO IMT, Clayton)
 Subject: Re: [galaxy-dev] pass more information on a dataset merge

 Wow, thanks for the rapid feedback! I have made the changes you have 
 suggested. It seems you must be interested in this idea/implementation. Let 
 me know if you have specific use cases/requirements in mind and/or if you 
 would be interested in write access to the 

Re: [galaxy-dev] card 79: Split large jobs over multiple nodes for processing

2013-02-06 Thread Alex.Khassapov
Hi All,

Can anybody please add a few words on how can we use the “initial 
implementation” which “ exists in the tasks framework”?

-Alex

From: Trello [mailto:do-not-re...@trello.com]
Sent: Wednesday, 6 February 2013 10:58 AM
To: Khassapov, Alex (CSIRO IMT, Clayton)
Subject: 4 new notifications on the board Galaxy: Development since 5:56 PM 
(Tuesday)

[https://trello.com/images/logo-s.png]

Notifications

On Galaxy: 
Developmenthttps://trello.com/board/galaxy-development/506338ce32ae458f6d15e4b3
[https://trello-avatars.s3.amazonaws.com/a6e93a63989ab71cd87ade0165a04b08/30.png]James
 Taylor added 
[https://trello-avatars.s3.amazonaws.com/d0f1bba8eb293d305140421271c383a9/30.png]
 Dannon Baker to the card 79: Split large jobs over multiple nodes for 
processinghttps://trello.com/card/79-split-large-jobs-over-multiple-nodes-for-processing/506338ce32ae458f6d15e4b3/411
 on Galaxy: 
Developmenthttps://trello.com/board/galaxy-development/506338ce32ae458f6d15e4b3
[https://trello-avatars.s3.amazonaws.com/a6e93a63989ab71cd87ade0165a04b08/30.png]James
 Taylor commented on the card 79: Split large jobs over multiple nodes for 
processinghttps://trello.com/card/79-split-large-jobs-over-multiple-nodes-for-processing/506338ce32ae458f6d15e4b3/411
 on Galaxy: 
Developmenthttps://trello.com/board/galaxy-development/506338ce32ae458f6d15e4b3
An initial implementation exists in the tasks framework.
[https://trello-avatars.s3.amazonaws.com/a6e93a63989ab71cd87ade0165a04b08/30.png]James
 Taylor moved the card 79: Split large jobs over multiple nodes for 
processinghttps://trello.com/card/79-split-large-jobs-over-multiple-nodes-for-processing/506338ce32ae458f6d15e4b3/411
 to Complete on Galaxy: 
Developmenthttps://trello.com/board/galaxy-development/506338ce32ae458f6d15e4b3
[https://trello-avatars.s3.amazonaws.com/a6e93a63989ab71cd87ade0165a04b08/30.png]James
 Taylor moved the card 137: allow multiple=true in input param fields of type 
datahttps://trello.com/card/137-allow-multiple-true-in-input-param-fields-of-type-data/506338ce32ae458f6d15e4b3/292
 to Pull Requests / Patches on Galaxy: 
Developmenthttps://trello.com/board/galaxy-development/506338ce32ae458f6d15e4b3


Change how often you get email on your account 
pagehttps://trello.com/my/account.

Follow Trello on Twitterhttps://twitter.com/intent/follow?user_id=360831528 
and Facebookhttps://www.facebook.com/TrelloApp.

Get the Trello app for iPhonehttp://itunes.com/apps/trello or 
Androidhttps://play.google.com/store/apps/details?id=com.trello.



___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] card 79: Split large jobs over multiple nodes for processing

2013-02-07 Thread Alex.Khassapov
Thanks Peter. I see, parallelism works on a single large file by splitting it 
and using multiple instances to process the bits in parallel.

In our case we use 'composite' data type, simply an array of input files and we 
would like to process them in parallel, instead of having a 'foreach' loop in 
the tool wrapper.

Is it possible?

We are looking at CloudMan for creating a cluster in Galaxy now.

-Alex

-Original Message-
From: Peter Cock [mailto:p.j.a.c...@googlemail.com] 
Sent: Thursday, 7 February 2013 9:09 PM
To: Khassapov, Alex (CSIRO IMT, Clayton)
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] card 79: Split large jobs over multiple nodes for 
processing

On Wed, Feb 6, 2013 at 11:43 PM, alex.khassa...@csiro.au wrote:

 Hi All,

 Can anybody please add a few words on how can we use the initial 
 implementation which  exists in the tasks framework?

 -Alex


To enable this, set use_tasked_jobs = True in your universe_wsgi.ini file. The 
tools must also be configured to allow this via the parallelism tag. Many of 
my tools do this, for example see the NCBI
BLAST+ wrappers in the tool shed. Additionally the data file formats
must support being split, or being merged - which is done via Python code in 
the Galaxy datatype definition (see the split and merge methods in 
lib/galaxy/datatypes/*.py). Some other relevant Python code is in 
lib/galaxy/jobs/splitters/*.py

Peter

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Composite datatype output for Cuffdiff

2013-03-04 Thread Alex.Khassapov
Hi Dannon,

I understand that instead of having one dataset with multiple files you are 
planning to use existing datasets and combine them in a 'collection'.  My 
concerns are:
1. Our data consists of 200-8000 files, can you imagine how many datasets we'll 
end up with? It will be a mess.
2. All these files in a  dataset belong to each other and it doesn't make much 
sense to keep them separately.
3. For performance reasons, all these files are located in a single directory 
which makes it easier to iterate over.
4. From my point of view, it makes perfect sense to have a concept of a 
dataset with multiple files, you have already a dataset_xxx_files folder 
anyway, and it's not a big change comparing to the new concept of collection
5. We are already using the m:xxx type datasets (thanks John) in our project, 
I guess you don't even have a timeframe for implementing the collection 
concept? I'm sure that for many projects using multi file datasets is a 
requirement now, not in 'years' time.
6. Collection is also a good idea and I guess they both can exist together, 
but only in the future, given current users an opportunity to use Galaxy for 
their needs. Otherwise we simply have to look at other frameworks which already 
support multi file datasets.

-Alex

From: Dannon Baker [mailto:dannon.ba...@gmail.com]
Sent: Tuesday, 5 March 2013 1:09 AM
To: Khassapov, Alex (CSIRO IMT, Clayton)
Cc: chil...@msi.umn.edu; galaxy-...@bx.psu.edu; NeCTAR Cloud Imaging Project 
Team
Subject: Re: [galaxy-dev] Composite datatype output for Cuffdiff

Alex,

To reiterate what Jeremy has already said on the mailing list, this is 
definitely something we want, and need, for Galaxy.  While this particular 
implementation has a lot of good parts, creating these collections as 
first-class composite datasets isn't ideal and we'd be stuck supporting them 
going forward, forever.

There's a clear plan for implementing this in Trello 
(https://trello.com/c/325AXIEr), most of which is straightforward to implement. 
 The 'hard' part is really going to be implementing an ideal UI for dealing 
with these collections, something which we could do in phases.

What exactly are your concerns with the implementation as set out in the Trello 
card?

-Dannon


On Mon, Mar 4, 2013 at 1:32 AM, 
alex.khassa...@csiro.aumailto:alex.khassa...@csiro.au wrote:
Yeah John,

This is sad, I don't understand why it is such a problem? If it's already 
implemented and used in real projects like ours - then it is needed for the 
community.  I don't think we have other options for our requirements, your 
multiple file datasets implementation was a real saviour for us.

-Alex

-Original Message-
From: jmchil...@gmail.commailto:jmchil...@gmail.com 
[mailto:jmchil...@gmail.commailto:jmchil...@gmail.com] On Behalf Of John 
Chilton
Sent: Monday, 4 March 2013 4:42 PM
To: Khassapov, Alex (CSIRO IMT, Clayton)
Cc: galaxy-...@bx.psu.edumailto:galaxy-...@bx.psu.edu
Subject: Re: [galaxy-dev] Composite datatype output for Cuffdiff

Hi Alex,

  Thanks for the comments. The galaxy team has made it clear here and to me 
privately that this will NOT be included in the Galaxy main code base. I hope 
and am I confident that they will make grouping datasets work, hopefully even 
to thousands of files.

  I do not believe the two ideas are mutually exclusive and I will be 
maintaining a fork of galaxy-central with these additions, I will set this up 
this week hopefully. I will do my best to respond to support requests and make 
multiple file datasets and composite types in general as robust as possible, 
keep up with Galaxy updates, etc
Obviously, it is risky to let a code base drift so far from galaxy main's 
however and you, me, and others who might want to use them will have to 
carefully weigh the risks when determining if multiple file datasets are worth 
the headache.

  Thanks for all your help and inputs. I am sorry this did not turn out 
differently, I feel I have really failed here.

-John


On Sun, Mar 3, 2013 at 10:08 PM,  
alex.khassa...@csiro.aumailto:alex.khassa...@csiro.au wrote:
 Hi John,

 Are you saying that composite multiple file dataset isn't required and 
 won't be implemented?

 We are using your implementation of multifiles dataset (m:xxx type) and 
 hope that eventually it will be pushed into main Galaxy implementation.

 As we are using Galaxy for CT reconstruction tools, where input and output 
 can consist of a couple thousand files, other options are not feasible, i.e. 
 grouping datasets.

 -Alex

 -Original Message-
 From: 
 galaxy-dev-boun...@lists.bx.psu.edumailto:galaxy-dev-boun...@lists.bx.psu.edu
 [mailto:galaxy-dev-boun...@lists.bx.psu.edumailto:galaxy-dev-boun...@lists.bx.psu.edu]
  On Behalf Of John Chilton
 Sent: Thursday, 28 February 2013 2:06 AM
 To: Jeremy Goecks
 Cc: Jim Johnson; galaxy-...@bx.psu.edumailto:galaxy-...@bx.psu.edu
 Subject: Re: [galaxy-dev] Composite datatype output for Cuffdiff

 Hey Jeremy,

   I am 

Re: [galaxy-dev] Multi File upload api

2013-05-30 Thread Alex.Khassapov
Hi John,

Can you please have a look at Neil's question.

Thank you,

-Alex

From: Burdett, Neil (ICT Centre, Herston - RBWH)
Sent: Thursday, 30 May 2013 4:30 PM
To: Khassapov, Alex (CSIRO IMT, Clayton)
Subject: Multi File upload api

Hi Alex,
  The file galaxy-dist/scripts/api/example_watch_folder.py allows us 
to watch a folder then upload files that arrive in a specified input folder 
into the database and history.

Can you ask your friend (who implemented the multi file upload tool), what 
changes are necessary to this file so we can upload multiple files as we do 
from the GUI. I assume he would know quite quickly what to do and hopefully 
quite simple

Thanks
Neil
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] Creating workflow which includes Multifile upload

2013-06-05 Thread Alex.Khassapov
Hi John,

One more problem with multifile upload - when I display a workflow which 
includes multi upload tool, I get:

Module workflow_run_mako:476 in render_row_for_param
  http://140.79.7.98/workflow/run?id=f597429621d6eb2b 
 __M_writer(unicode(param.get_label()))
AttributeError: 'UploadDataset' object has no attribute 'get_label'

Ok, I see that UploadDataset class is derived from Group, not ToolParameter. So 
I tried to add get_label() to the Group class, which returns some string.  But 
then I get:

Module workflow_run_mako:476 in render_row_for_param
  http://140.79.7.98/workflow/run?id=f597429621d6eb2b 
 __M_writer(unicode(param.get_label()))
TypeError: 'str' object is not callable

Here my knowledge of Galaxy ends and I need some help please.

-Alex
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Creating workflow which includes Multifile upload

2013-06-05 Thread Alex.Khassapov
Hi Peter,

Of course I added def get_label(self), as a matter of fact, I copied 
get_label() from ToolParameter class. That's why I'm a bit confused.  

The get_label function returns a string which is supposed to be displayed, but 
instead something is trying to execute it?

Best Regards,

Alex Khassapov

Software Engineer
CSIRO IMT

From: Peter Cock [p.j.a.c...@googlemail.com]
Sent: Wednesday, 5 June 2013 7:36 PM
To: Khassapov, Alex (CSIRO IMT, Clayton)
Cc: chil...@msi.umn.edu; galaxy-...@bx.psu.edu; NeCTAR Cloud Imaging Project 
Team
Subject: Re: [galaxy-dev] Creating workflow which includes Multifile upload

On Wed, Jun 5, 2013 at 8:56 AM,  alex.khassa...@csiro.au wrote:
 Hi John,

 One more problem with multifile upload – when I display a workflow which
 includes multi upload tool, I get:

 Module workflow_run_mako:476 in render_row_for_param
  __M_writer(unicode(param.get_label()))
 AttributeError: 'UploadDataset' object has no attribute 'get_label'

 Ok, I see that UploadDataset class is derived from Group, not ToolParameter.
 So I tried to add get_label() to the Group class, which returns some string.
 But then I get:

 Module workflow_run_mako:476 in render_row_for_param
  __M_writer(unicode(param.get_label()))
 TypeError: 'str' object is not callable

 Here my knowledge of Galaxy ends and I need some help please.

Hi Alex,

I guess from the Python exception that you didn't create a method
called get_label, but a property or attribute perhaps? Try this at the
python prompt and you'll get the same TypeError:

 hello()
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: 'str' object is not callable

I would have added a get_label method to the class using something
like this:

class UploadDataset(...):

def get_label(self):
   return Uploaded stuff

Peter

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


[galaxy-dev] Appending _task_%d suffix to multi files

2013-07-30 Thread Alex.Khassapov
Hi guys,



We've been using Galaxy for a year now, we created our own Galaxy fork where we 
were making changes to adapt Galaxy to our requirements.  As we need multiple 
file dataset - we were using Johns' fork for that initially.



Now we are trying to use The most updated version of the multiple file dataset 
stuff https://bitbucket.org/msiappdev/galaxy-extras/ directly as we don't want 
to maintain our own version.



One of the problems we have - when we upload multiple files - their file names 
are changed (_task_%d suffix is added to their names).



On our branch we simply removed the code which does it, but now we wonder if it 
is possible to avoid this renaming somehow? I.e. make it configurable?



Is it really necessary to change the file names?



-Alex



-Original Message-
From: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Jorrit Boekel
Sent: Thursday, 25 October 2012 8:35 PM
To: Peter Cock
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] the multi job splitter





I keep the files matched by keeping a _task_%d suffix to their names. So each 
task is matched with its correct counterpart with the same number.



cheers,

jorrit



___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Appending _task_%d suffix to multi files

2013-08-01 Thread Alex.Khassapov
Hi Piotr,

Regarding data parallelism - Galaxy can split a single large file into small 
parts and process them in parallel, then merge outputs into single file.

That's not what we need, as we already have multiple input files. But as I 
understand, there's a possibility to write our own splitters/mergers to fit our 
requirements.

And yeah, Jorrit - enjoy your holidays!

-Alex

From: Jorrit Boekel [mailto:jorrit.boe...@scilifelab.se]
Sent: Thursday, 1 August 2013 7:45 PM
To: Szul, Piotr (ICT Centre, Marsfield)
Cc: Khassapov, Alex (CSIRO IMT, Clayton); p.j.a.c...@googlemail.com; 
jmchil...@gmail.com; galaxy-dev@lists.bx.psu.edu; Burdett, Neil (ICT Centre, 
Herston - RBWH)
Subject: Re: Appending _task_%d suffix to multi files

Hi Piotr,

In our proteomics lab, a protein sample is fractionated (by e.g. pH) before 
analysis in a nr of sample fractions. The fractions are then run through the 
mass spectrometer one at a time. Each fraction yields a data file.

The mass spec data is then matched to peptides by searching a FASTA file, 
termed target, with protein sequences. Afterwards the matches are statistically 
scored by machine learning. To do this, the data is also matched with a 
scrambled FASTA file, termed decoy. Each fraction is matched to a target and 
decoy file, which yields two match-files per fraction.

The machine learning tool thus picks a target and a decoy matchfile and puts 
statistical significances on the matches. In order for this to be correct, it 
needs to pick matchfiles that correspond, ie that are derived from the same 
fraction.

In our lab, we have not yet looked at John Chilton's (I think) work with the m: 
data sets, and our parallel processing is done inside galaxy, using its split 
and merge functions to divide a job into tasks. Each task is sent as a separate 
job to sge, I think, but others may know more about this than I.

I really have to get back to my holiday now, cheers,
jorrit

On 08/01/2013 04:17 AM, piotr.s...@csiro.aumailto:piotr.s...@csiro.au wrote:
Hi Jorrit,
Thank you for your explanation. Would you be able to give us an example of what 
do you mean by fractions and when the task_%d are being used to pick files. 
Just want to make sure we have  good understanding of the problem that you 
solved.
Also, I vaguely remember seeing 'data parallelism mentioned somewhere with 
relation to the m: data sets.  Do you currently support in any way automatic 
distribution of processing of such datasets to parallel environments (e.g. 
array jobs in sge or such?)
Cheers,

-  Piotr


From: Jorrit Boekel [mailto:jorrit.boe...@scilifelab.se]
Sent: Wednesday, July 31, 2013 8:18 PM
To: Khassapov, Alex (CSIRO IMT, Clayton)
Cc: p.j.a.c...@googlemail.commailto:p.j.a.c...@googlemail.com; 
jmchil...@gmail.commailto:jmchil...@gmail.com; 
galaxy-dev@lists.bx.psu.edumailto:galaxy-dev@lists.bx.psu.edu; Szul, Piotr 
(ICT Centre, Marsfield); Burdett, Neil (ICT Centre, Herston - RBWH)
Subject: Re: Appending _task_%d suffix to multi files

Hi Alex,

In our lab, files are often fractions of an experiments, but they are named by 
their creators in whatever way they like. I put that code in to standardize 
fraction naming, in case a tool needs input from two files that originate from 
the same fraction (but have been treated in different ways). In those cases, in 
my fork, Galaxy always picks the files with the same task_%d numbers.

I can't help you very much right now, as I'm currently away from work until 
October, but I hope this explains why its in there.

cheers,
jorrit

On 07/31/2013 04:15 AM, alex.khassa...@csiro.aumailto:alex.khassa...@csiro.au 
wrote:

Hi guys,



We've been using Galaxy for a year now, we created our own Galaxy fork where we 
were making changes to adapt Galaxy to our requirements.  As we need multiple 
file dataset - we were using Johns' fork for that initially.



Now we are trying to use The most updated version of the multiple file dataset 
stuff https://bitbucket.org/msiappdev/galaxy-extras/ directly as we don't want 
to maintain our own version.



One of the problems we have - when we upload multiple files - their file names 
are changed (_task_%d suffix is added to their names).



On our branch we simply removed the code which does it, but now we wonder if it 
is possible to avoid this renaming somehow? I.e. make it configurable?



Is it really necessary to change the file names?



-Alex



-Original Message-
From: 
galaxy-dev-boun...@lists.bx.psu.edumailto:galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Jorrit Boekel
Sent: Thursday, 25 October 2012 8:35 PM
To: Peter Cock
Cc: galaxy-dev@lists.bx.psu.edumailto:galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] the multi job splitter





I keep the files matched by keeping a _task_%d suffix to their names. So each 
task is matched with its correct counterpart with the same number.



cheers,

jorrit