Re: [galaxy-dev] approximate line numbers shown in history items can be very imprecise

2011-08-02 Thread Jelle Scholtalbers
Hi Chaolin,

Although I haven't tested this, the following changes should probably work.

In the datatypes_conf.xml change on this line:
 datatype extension=data type=galaxy.datatypes.data:Data
mimetype=application/octet-stream
max_optional_metadata_filesize=1048576 /

the max_optional_metadata_filesize to the desired size or remove tag like so:

 datatype extension=data type=galaxy.datatypes.data:Data
mimetype=application/octet-stream  /

Furthermore in the file lib/galaxy/datatypes/data.py at function
set_peek() set the size of the dataset higher or remove the check:

# Number of lines is not known ( this should not happen ), and auto-detect is
# needed to set metadata
# This can happen when the file is larger than max_optional_metadata_filesize.
if int(dataset.get_size()) = 1048576:
  #Small dataset, recount all lines and reset peek afterward.

To remove datasize check set to:
if int(dataset.get_size()):

Cheers,
Jelle


On Tue, Aug 2, 2011 at 4:42 AM, Chaolin Zhang zhangchao...@gmail.com wrote:
 Hi
 Anyone knows how to turn off the line number estimation  and get the exact 
 count for each dataset?
 Thanks!

 Chaolin

 On Jul 31, 2011, at 11:38 AM, Dannon Baker wrote:

 Chaolin,

 You guessed correctly as to why we implemented this, getting exact line 
 counts on very large files is a time consuming process.  You can still get 
 an exact line count using the Line/Word/Character count tool in the Text 
 Manipulation section.

 If you're interested in the way it currently works, the first 1MB of a large 
 file is read, and a line number approximation is made from that and the 
 assumption that line lengths don't vary dramatically throughout the file.  
 It would slow down the metadata setting, but for a personal galaxy instance 
 you could certainly increase that number, or disable the estimation entirely.

 -Dannon


 On Jul 31, 2011, at 11:11 AM, Chaolin Zhang wrote:

 Hi,

 I noticed that the current version of galaxy shows approximate number of 
 lines for history items, when it is relatively big.  I guess this is due to 
 a consideration of performance, but it is quite annoying, because the exact 
 line numbers provide a very easy way for users to get simple statistics.  
 Sometimes the approximation can be really off. For instance, for one file 
 with  1 M lines, it shows ~850,000 lines.  Any thought?

 Chaolin





 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/



 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] approximate line numbers shown in history items can be very imprecise

2011-07-31 Thread Dannon Baker
Chaolin,

You guessed correctly as to why we implemented this, getting exact line counts 
on very large files is a time consuming process.  You can still get an exact 
line count using the Line/Word/Character count tool in the Text Manipulation 
section.  

If you're interested in the way it currently works, the first 1MB of a large 
file is read, and a line number approximation is made from that and the 
assumption that line lengths don't vary dramatically throughout the file.  It 
would slow down the metadata setting, but for a personal galaxy instance you 
could certainly increase that number, or disable the estimation entirely.

-Dannon


On Jul 31, 2011, at 11:11 AM, Chaolin Zhang wrote:

 Hi,
 
 I noticed that the current version of galaxy shows approximate number of 
 lines for history items, when it is relatively big.  I guess this is due to a 
 consideration of performance, but it is quite annoying, because the exact 
 line numbers provide a very easy way for users to get simple statistics.  
 Sometimes the approximation can be really off. For instance, for one file 
 with  1 M lines, it shows ~850,000 lines.  Any thought?
 
 Chaolin
 
 
 
 
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
  http://lists.bx.psu.edu/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] approximate line numbers shown in history items can be very imprecise

2011-07-31 Thread Chaolin Zhang
Hi Dannon,

Thanks for the information.  Yes, I know the line number tool, but running this 
for each file a user is interested in getting a line number is a pain (and 
confusing because the history items double), because in our case (probably not 
rare for real analysis tasks), this information is almost always needed to 
watch if anything funny happens.  Maybe it is possible to add a link near the 
approximate line number, so that the user can get the accurate numbers more 
easily if desired?

Chaolin



On Jul 31, 2011, at 11:38 AM, Dannon Baker wrote:

 Chaolin,
 
 You guessed correctly as to why we implemented this, getting exact line 
 counts on very large files is a time consuming process.  You can still get an 
 exact line count using the Line/Word/Character count tool in the Text 
 Manipulation section.  
 
 If you're interested in the way it currently works, the first 1MB of a large 
 file is read, and a line number approximation is made from that and the 
 assumption that line lengths don't vary dramatically throughout the file.  It 
 would slow down the metadata setting, but for a personal galaxy instance you 
 could certainly increase that number, or disable the estimation entirely.
 
 -Dannon
 
 
 On Jul 31, 2011, at 11:11 AM, Chaolin Zhang wrote:
 
 Hi,
 
 I noticed that the current version of galaxy shows approximate number of 
 lines for history items, when it is relatively big.  I guess this is due to 
 a consideration of performance, but it is quite annoying, because the exact 
 line numbers provide a very easy way for users to get simple statistics.  
 Sometimes the approximation can be really off. For instance, for one file 
 with  1 M lines, it shows ~850,000 lines.  Any thought?
 
 Chaolin
 
 
 
 
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
 http://lists.bx.psu.edu/
 


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] approximate line numbers shown in history items can be very imprecise

2011-07-31 Thread Chaolin Zhang
Hi Dannon,

We do have a local mirror here.  How to disable the estimation?

Thanks!

Chaolin


On Jul 31, 2011, at 11:38 AM, Dannon Baker wrote:

 Chaolin,
 
 You guessed correctly as to why we implemented this, getting exact line 
 counts on very large files is a time consuming process.  You can still get an 
 exact line count using the Line/Word/Character count tool in the Text 
 Manipulation section.  
 
 If you're interested in the way it currently works, the first 1MB of a large 
 file is read, and a line number approximation is made from that and the 
 assumption that line lengths don't vary dramatically throughout the file.  It 
 would slow down the metadata setting, but for a personal galaxy instance you 
 could certainly increase that number, or disable the estimation entirely.
 
 -Dannon
 
 
 On Jul 31, 2011, at 11:11 AM, Chaolin Zhang wrote:
 
 Hi,
 
 I noticed that the current version of galaxy shows approximate number of 
 lines for history items, when it is relatively big.  I guess this is due to 
 a consideration of performance, but it is quite annoying, because the exact 
 line numbers provide a very easy way for users to get simple statistics.  
 Sometimes the approximation can be really off. For instance, for one file 
 with  1 M lines, it shows ~850,000 lines.  Any thought?
 
 Chaolin
 
 
 
 
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
 http://lists.bx.psu.edu/
 


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/