Re: ReadHITRAN

2021-09-20 Thread Patrick Eriksson

Stefan,

Yes, that sounds reasonable. I simply lagging behind the development and 
need to catch up in how we do things. I emailed Richard and Freddy on 
the side about some stuff.


But when you brought it up, is there a documentation on the replacement 
mechanism? In the email to R&F I also suggested a README in Artscat, to 
clarify the content in the folder.


Bye,

Patrick



On 2021-09-20 11:10, Stefan Buehler wrote:

Dear Patrick,

I think we should put ARTS’ own line catalog in the center wherever possible 
(which is based on converted current HITRAN). Use it, if you are happy with the 
parameters there. If you want other parameters, and there is a good reason for 
that, consider updating it. We have a mechanism to replace individual 
parameters there (and document those substitutions).

Stefan

On 20 Sep 2021, at 10:46, Patrick Eriksson wrote:


Richard,

Thanks for additional information. Seems that the take home message is that I 
should look at other ways to set up the calculations. I just picked up an old 
cfile, used that as a starting point and did not even consider alternatives to 
use ReadHITRAN.

Bye,

Patrick

On 2021-09-20 09:05, Richard Larsson wrote:

Hi Patrick,

We can of course optimize the reading routine but there's no point in doing 
that.  The methods that read external catalogs should only ever be used once 
per update of the external catalog, so it's fine if they are slow but not too 
slow.

New memory is allocated for every absorption line always.  This is because we 
keep line data local, and the model for the line shape and the local quantum 
numbers don't have to be known at compile-time.

Additionally, the line data is pushed into arrays, so they will double in size 
every time you reach the current size.

If we knew the number of lines and broadening species and local quantum 
numbers, then these allocations happen once for the entire band, but we don't 
in ReadHITRAN or any of the external reading routines.  So you will have 
many-many system calls asking for more memory.  This of course also means that 
you are over-allocating memory since that's how Arrays work in ARTS (because 
that's standard C++).  Again, this is also fine since the external catalog when 
read again will allocate only exactly what is required.

With hope,
--Richard

Den mån 20 sep. 2021 kl 08:09 skrev Patrick Eriksson mailto:patrick.eriks...@chalmers.se>> :

 Richard,

 Thanks for the clarification.

 Is the allocation of more memory done in fixed chunks? Or something
 "smart" in the process? If the former and the chunks are too small,
 then
 maybe I am doing a lot of reallocations. My impression was that memory
 usage increased quite monotonically, not in noticeable steps.

 If the lines have to be sorted into bands, then the complexity of the
 reading will increase in line with what I have noticed. And likely not
 much to do about it.

 Bye,

 Patrick



  > There are two possible slowdowns there could be still. One is
 that you
  > hit some line count where you need to reallocate the array of lines
  > because you have too many. The other is that the search for
 placing the
  > line in the correct band is slow when there are more bands to
 look through.
  >
  > The former would be just pure bad luck, so there's nothing to do
 about it.
  >
  > I would suspect the latter is your problem.  You need to search
 through
  > the existing bands for every new line to find where it belongs. 
Since
  > bands are often clustered closely together in frequency, this
 could slow
  > down the reading as you get more and more bands. A smaller frequency
  > range means fewer bands to look through.
  >
  > //Richard
  >
  > On Sun, Sep 19, 2021, 22:39 Patrick Eriksson
  > mailto:patrick.eriks...@chalmers.se>
 >> wrote:
  >
  >     Richard,
  >
  >      > It's expected to take a somewhat arbitrary time.  It reads
 ASCII.
  >
  >     I have tried multiple times and the pattern is not changing.
  >
  >
  >      > The start-up time is going to be large because of having
 to find the
  >      > first frequency, which means you have to parse the text
 nonetheless.
  >
  >     Understood. But that overhead seems to be relatively small.
 In my test,
  >     it seemed to take 4-7 s to reach the first frequency. Anyhow,
 this goes
  >     in the other direction. To minimise the parsing to reach the
 first
  >     frequency, it should be better to read all in one go, and not
 in parts
  >     (which is the case for me).
  >
  >     Bye,
  >
  >     Patrick
  >



Re: ReadHITRAN

2021-09-20 Thread Stefan Buehler
Dear Patrick,

I think we should put ARTS’ own line catalog in the center wherever possible 
(which is based on converted current HITRAN). Use it, if you are happy with the 
parameters there. If you want other parameters, and there is a good reason for 
that, consider updating it. We have a mechanism to replace individual 
parameters there (and document those substitutions).

Stefan

On 20 Sep 2021, at 10:46, Patrick Eriksson wrote:

> Richard,
>
> Thanks for additional information. Seems that the take home message is that I 
> should look at other ways to set up the calculations. I just picked up an old 
> cfile, used that as a starting point and did not even consider alternatives 
> to use ReadHITRAN.
>
> Bye,
>
> Patrick
>
> On 2021-09-20 09:05, Richard Larsson wrote:
>> Hi Patrick,
>>
>> We can of course optimize the reading routine but there's no point in doing 
>> that.  The methods that read external catalogs should only ever be used once 
>> per update of the external catalog, so it's fine if they are slow but not 
>> too slow.
>>
>> New memory is allocated for every absorption line always.  This is because 
>> we keep line data local, and the model for the line shape and the local 
>> quantum numbers don't have to be known at compile-time.
>>
>> Additionally, the line data is pushed into arrays, so they will double in 
>> size every time you reach the current size.
>>
>> If we knew the number of lines and broadening species and local quantum 
>> numbers, then these allocations happen once for the entire band, but we 
>> don't in ReadHITRAN or any of the external reading routines.  So you will 
>> have many-many system calls asking for more memory.  This of course also 
>> means that you are over-allocating memory since that's how Arrays work in 
>> ARTS (because that's standard C++).  Again, this is also fine since the 
>> external catalog when read again will allocate only exactly what is required.
>>
>> With hope,
>> --Richard
>>
>> Den mån 20 sep. 2021 kl 08:09 skrev Patrick Eriksson 
>> mailto:patrick.eriks...@chalmers.se>> :
>>
>> Richard,
>>
>> Thanks for the clarification.
>>
>> Is the allocation of more memory done in fixed chunks? Or something
>> "smart" in the process? If the former and the chunks are too small,
>> then
>> maybe I am doing a lot of reallocations. My impression was that memory
>> usage increased quite monotonically, not in noticeable steps.
>>
>> If the lines have to be sorted into bands, then the complexity of the
>> reading will increase in line with what I have noticed. And likely not
>> much to do about it.
>>
>> Bye,
>>
>> Patrick
>>
>>
>>
>>  > There are two possible slowdowns there could be still. One is
>> that you
>>  > hit some line count where you need to reallocate the array of lines
>>  > because you have too many. The other is that the search for
>> placing the
>>  > line in the correct band is slow when there are more bands to
>> look through.
>>  >
>>  > The former would be just pure bad luck, so there's nothing to do
>> about it.
>>  >
>>  > I would suspect the latter is your problem.  You need to search
>> through
>>  > the existing bands for every new line to find where it belongs. 
>> Since
>>  > bands are often clustered closely together in frequency, this
>> could slow
>>  > down the reading as you get more and more bands. A smaller frequency
>>  > range means fewer bands to look through.
>>  >
>>  > //Richard
>>  >
>>  > On Sun, Sep 19, 2021, 22:39 Patrick Eriksson
>>  > > 
>> > >> wrote:
>>  >
>>  >     Richard,
>>  >
>>  >      > It's expected to take a somewhat arbitrary time.  It reads
>> ASCII.
>>  >
>>  >     I have tried multiple times and the pattern is not changing.
>>  >
>>  >
>>  >      > The start-up time is going to be large because of having
>> to find the
>>  >      > first frequency, which means you have to parse the text
>> nonetheless.
>>  >
>>  >     Understood. But that overhead seems to be relatively small.
>> In my test,
>>  >     it seemed to take 4-7 s to reach the first frequency. Anyhow,
>> this goes
>>  >     in the other direction. To minimise the parsing to reach the
>> first
>>  >     frequency, it should be better to read all in one go, and not
>> in parts
>>  >     (which is the case for me).
>>  >
>>  >     Bye,
>>  >
>>  >     Patrick
>>  >
>>


Re: ReadHITRAN

2021-09-20 Thread Patrick Eriksson

Richard,

Thanks for additional information. Seems that the take home message is 
that I should look at other ways to set up the calculations. I just 
picked up an old cfile, used that as a starting point and did not even 
consider alternatives to use ReadHITRAN.


Bye,

Patrick

On 2021-09-20 09:05, Richard Larsson wrote:

Hi Patrick,

We can of course optimize the reading routine but there's no point in 
doing that.  The methods that read external catalogs should only ever be 
used once per update of the external catalog, so it's fine if they are 
slow but not too slow.


New memory is allocated for every absorption line always.  This is 
because we keep line data local, and the model for the line shape and 
the local quantum numbers don't have to be known at compile-time.


Additionally, the line data is pushed into arrays, so they will double 
in size every time you reach the current size.


If we knew the number of lines and broadening species and local quantum 
numbers, then these allocations happen once for the entire band, but we 
don't in ReadHITRAN or any of the external reading routines.  So you 
will have many-many system calls asking for more memory.  This of course 
also means that you are over-allocating memory since that's how Arrays 
work in ARTS (because that's standard C++).  Again, this is also fine 
since the external catalog when read again will allocate only exactly 
what is required.


With hope,
--Richard

Den mån 20 sep. 2021 kl 08:09 skrev Patrick Eriksson 
mailto:patrick.eriks...@chalmers.se>>:


Richard,

Thanks for the clarification.

Is the allocation of more memory done in fixed chunks? Or something
"smart" in the process? If the former and the chunks are too small,
then
maybe I am doing a lot of reallocations. My impression was that memory
usage increased quite monotonically, not in noticeable steps.

If the lines have to be sorted into bands, then the complexity of the
reading will increase in line with what I have noticed. And likely not
much to do about it.

Bye,

Patrick



 > There are two possible slowdowns there could be still. One is
that you
 > hit some line count where you need to reallocate the array of lines
 > because you have too many. The other is that the search for
placing the
 > line in the correct band is slow when there are more bands to
look through.
 >
 > The former would be just pure bad luck, so there's nothing to do
about it.
 >
 > I would suspect the latter is your problem.  You need to search
through
 > the existing bands for every new line to find where it belongs. 
Since

 > bands are often clustered closely together in frequency, this
could slow
 > down the reading as you get more and more bands. A smaller frequency
 > range means fewer bands to look through.
 >
 > //Richard
 >
 > On Sun, Sep 19, 2021, 22:39 Patrick Eriksson
 > mailto:patrick.eriks...@chalmers.se>
>> wrote:
 >
 >     Richard,
 >
 >      > It's expected to take a somewhat arbitrary time.  It reads
ASCII.
 >
 >     I have tried multiple times and the pattern is not changing.
 >
 >
 >      > The start-up time is going to be large because of having
to find the
 >      > first frequency, which means you have to parse the text
nonetheless.
 >
 >     Understood. But that overhead seems to be relatively small.
In my test,
 >     it seemed to take 4-7 s to reach the first frequency. Anyhow,
this goes
 >     in the other direction. To minimise the parsing to reach the
first
 >     frequency, it should be better to read all in one go, and not
in parts
 >     (which is the case for me).
 >
 >     Bye,
 >
 >     Patrick
 >



Re: ReadHITRAN

2021-09-20 Thread Richard Larsson
Hi Patrick,

We can of course optimize the reading routine but there's no point in doing
that.  The methods that read external catalogs should only ever be used
once per update of the external catalog, so it's fine if they are slow but
not too slow.

New memory is allocated for every absorption line always.  This is because
we keep line data local, and the model for the line shape and the local
quantum numbers don't have to be known at compile-time.

Additionally, the line data is pushed into arrays, so they will double in
size every time you reach the current size.

If we knew the number of lines and broadening species and local quantum
numbers, then these allocations happen once for the entire band, but we
don't in ReadHITRAN or any of the external reading routines.  So you will
have many-many system calls asking for more memory.  This of course also
means that you are over-allocating memory since that's how Arrays work
in ARTS (because that's standard C++).  Again, this is also fine since the
external catalog when read again will allocate only exactly what is
required.

With hope,
--Richard

Den mån 20 sep. 2021 kl 08:09 skrev Patrick Eriksson <
patrick.eriks...@chalmers.se>:

> Richard,
>
> Thanks for the clarification.
>
> Is the allocation of more memory done in fixed chunks? Or something
> "smart" in the process? If the former and the chunks are too small, then
> maybe I am doing a lot of reallocations. My impression was that memory
> usage increased quite monotonically, not in noticeable steps.
>
> If the lines have to be sorted into bands, then the complexity of the
> reading will increase in line with what I have noticed. And likely not
> much to do about it.
>
> Bye,
>
> Patrick
>
>
>
> > There are two possible slowdowns there could be still. One is that you
> > hit some line count where you need to reallocate the array of lines
> > because you have too many. The other is that the search for placing the
> > line in the correct band is slow when there are more bands to look
> through.
> >
> > The former would be just pure bad luck, so there's nothing to do about
> it.
> >
> > I would suspect the latter is your problem.  You need to search through
> > the existing bands for every new line to find where it belongs.  Since
> > bands are often clustered closely together in frequency, this could slow
> > down the reading as you get more and more bands. A smaller frequency
> > range means fewer bands to look through.
> >
> > //Richard
> >
> > On Sun, Sep 19, 2021, 22:39 Patrick Eriksson
> > mailto:patrick.eriks...@chalmers.se>>
> wrote:
> >
> > Richard,
> >
> >  > It's expected to take a somewhat arbitrary time.  It reads ASCII.
> >
> > I have tried multiple times and the pattern is not changing.
> >
> >
> >  > The start-up time is going to be large because of having to find
> the
> >  > first frequency, which means you have to parse the text
> nonetheless.
> >
> > Understood. But that overhead seems to be relatively small. In my
> test,
> > it seemed to take 4-7 s to reach the first frequency. Anyhow, this
> goes
> > in the other direction. To minimise the parsing to reach the first
> > frequency, it should be better to read all in one go, and not in
> parts
> > (which is the case for me).
> >
> > Bye,
> >
> > Patrick
> >
>


Re: ReadHITRAN

2021-09-19 Thread Patrick Eriksson

Richard,

Thanks for the clarification.

Is the allocation of more memory done in fixed chunks? Or something 
"smart" in the process? If the former and the chunks are too small, then 
maybe I am doing a lot of reallocations. My impression was that memory 
usage increased quite monotonically, not in noticeable steps.


If the lines have to be sorted into bands, then the complexity of the 
reading will increase in line with what I have noticed. And likely not 
much to do about it.


Bye,

Patrick



There are two possible slowdowns there could be still. One is that you 
hit some line count where you need to reallocate the array of lines 
because you have too many. The other is that the search for placing the 
line in the correct band is slow when there are more bands to look through.


The former would be just pure bad luck, so there's nothing to do about it.

I would suspect the latter is your problem.  You need to search through 
the existing bands for every new line to find where it belongs.  Since 
bands are often clustered closely together in frequency, this could slow 
down the reading as you get more and more bands. A smaller frequency 
range means fewer bands to look through.


//Richard

On Sun, Sep 19, 2021, 22:39 Patrick Eriksson 
mailto:patrick.eriks...@chalmers.se>> wrote:


Richard,

 > It's expected to take a somewhat arbitrary time.  It reads ASCII.

I have tried multiple times and the pattern is not changing.


 > The start-up time is going to be large because of having to find the
 > first frequency, which means you have to parse the text nonetheless.

Understood. But that overhead seems to be relatively small. In my test,
it seemed to take 4-7 s to reach the first frequency. Anyhow, this goes
in the other direction. To minimise the parsing to reach the first
frequency, it should be better to read all in one go, and not in parts
(which is the case for me).

Bye,

Patrick



Re: ReadHITRAN

2021-09-19 Thread Richard Larsson
Patrick,

I think I misunderstood you there.

There are two possible slowdowns there could be still. One is that you hit
some line count where you need to reallocate the array of lines because you
have too many. The other is that the search for placing the line in the
correct band is slow when there are more bands to look through.

The former would be just pure bad luck, so there's nothing to do about it.

I would suspect the latter is your problem.  You need to search through the
existing bands for every new line to find where it belongs.  Since bands
are often clustered closely together in frequency, this could slow down the
reading as you get more and more bands. A smaller frequency range means
fewer bands to look through.

//Richard

On Sun, Sep 19, 2021, 22:39 Patrick Eriksson 
wrote:

> Richard,
>
> > It's expected to take a somewhat arbitrary time.  It reads ASCII.
>
> I have tried multiple times and the pattern is not changing.
>
>
> > The start-up time is going to be large because of having to find the
> > first frequency, which means you have to parse the text nonetheless.
>
> Understood. But that overhead seems to be relatively small. In my test,
> it seemed to take 4-7 s to reach the first frequency. Anyhow, this goes
> in the other direction. To minimise the parsing to reach the first
> frequency, it should be better to read all in one go, and not in parts
> (which is the case for me).
>
> Bye,
>
> Patrick
>


Re: ReadHITRAN

2021-09-19 Thread Patrick Eriksson

Richard,


It's expected to take a somewhat arbitrary time.  It reads ASCII.


I have tried multiple times and the pattern is not changing.


The start-up time is going to be large because of having to find the 
first frequency, which means you have to parse the text nonetheless.


Understood. But that overhead seems to be relatively small. In my test, 
it seemed to take 4-7 s to reach the first frequency. Anyhow, this goes 
in the other direction. To minimise the parsing to reach the first 
frequency, it should be better to read all in one go, and not in parts 
(which is the case for me).


Bye,

Patrick


Re: ReadHITRAN

2021-09-19 Thread Richard Larsson
Hi,

It's expected to take a somewhat arbitrary time.  It reads ASCII.

The start-up time is going to be large because of having to find the first
frequency, which means you have to parse the text nonetheless.

//Richard

On Sun, Sep 19, 2021, 20:19 Patrick Eriksson 
wrote:

> Hi all,
>
> I have noticed that the time used by ReadHITRAN is not linear with the
> width of the frequency range. For example, to read all lines (of five
> main species) between 800 and 840 cm-1 used 90 s, while reading 800-820
> and 820-840 cm-1) together used 57 s.
>
> Is this expected?
>
> (The above uses 1-2% of my RAM).
>
> Bye,
>
> Patrick
>