RE: Cutting slices

2023-03-05 Thread avi.e.gross
I am not commenting on the technique or why it is chosen just the part where
the last search looks for a non-existent period:

s = 'alpha.beta.gamma'
...
s[ 11: s.find( '.', 11 )]

What should "find" do if it hits the end of a string without finding the
period you claim is a divider?

Could that be why gamma got truncated?

Unless you can arrange for a terminal period, maybe you can reconsider the
approach.


-Original Message-
From: Python-list  On
Behalf Of aapost
Sent: Sunday, March 5, 2023 6:00 PM
To: python-list@python.org
Subject: Re: Cutting slices

On 3/5/23 17:43, Stefan Ram wrote:
>The following behaviour of Python strikes me as being a bit
>"irregular". A user tries to chop of sections from a string,
>but does not use "split" because the separator might become
>more complicated so that a regular expression will be required
>to find it. But for now, let's use a simple "find":
>
> |>>> s = 'alpha.beta.gamma'
> |>>> s[ 0: s.find( '.', 0 )]
> |'alpha'
> |>>> s[ 6: s.find( '.', 6 )]
> |'beta'
> |>>> s[ 11: s.find( '.', 11 )]
> |'gamm'
> |>>>
> 
>. The user always inserted the position of the previous find plus
>one to start the next "find", so he uses "0", "6", and "11".
>But the "a" is missing from the final "gamma"!
>
>And it seems that there is no numerical value at all that
>one can use for "n" in "string[ 0: n ]" to get the whole
>string, isn't it?
> 
> 

I would agree with 1st part of the comment.

Just noting that string[11:], string[11:None], as well as string[11:16] 
work ... as well as string[11:324242]... lol..
-- 
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Fast full-text searching in Python (job for Whoosh?)

2023-03-05 Thread avi.e.gross
Dino, Sending lots of data to an archived forum is not a great idea. I
snipped most of it out below as not to replicate it.

Your question does not look difficult unless your real question is about
speed. Realistically, much of the time spent generally is in reading in a
file and the actual search can be quite rapid with a wide range of methods.

The data looks boring enough and seems to not have much structure other than
one comma possibly separating two fields. Do you want the data as one wide
filed or perhaps in two parts, which a CSV file is normally used to
represent. Do you ever have questions like tell me all cars whose name
begins with the letter D and has a V6 engine? If so, you may want more than
a vanilla search.

What exactly do you want to search for? Is it a set of built-in searches or
something the user types in?

The data seems to be sorted by the first field and then by the second and I
did not check if some searches might be ambiguous. Can there be many entries
containing III? Yep. Can the same words like Cruiser or Hybrid appear? 

So is this a one-time search or multiple searches once loaded as in a
service that stays resident and fields requests. The latter may be worth
speeding up.

I don't NEED to know any of this but want you to know that the answer may
depend on this and similar factors. We had a long discussion lately on
whether to search using regular expressions or string methods. If your data
is meant to be used once, you may not even need to read the file into
memory, but read something like a line at a time and test it. Or, if you end
up with more data like how many cylinders a car has, it may be time to read
it in not just to a list of lines or such data structures, but get
numpy/pandas involved and use their many search methods in something like a
data.frame.

Of course if you are worried about portability, keep using Get Regular
Expression Print.

Your example was:

 $ grep -i v60 all_cars_unique.csv
 Genesis,GV60
 Volvo,V60

You seem to have wanted case folding and that is NOT a normal search. And
your search is matching anything on any line. If you wanted only a complete
field, such as all text after a comma to the end of the line, you could use
grep specifications to say that.

But once inside python, you would need to make choices depending on what
kind of searches you want to allow but also things like do you want all
matching lines shown if you search for say "a" ...




-Original Message-
From: Python-list  On
Behalf Of Dino
Sent: Saturday, March 4, 2023 10:47 PM
To: python-list@python.org
Subject: Re: Fast full-text searching in Python (job for Whoosh?)


Here's the complete data file should anyone care.

Acura,CL
Acura,ILX
Acura,Integra
Acura,Legend

smart,fortwo electric drive
smart,fortwo electric drive cabrio

-- 
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Cutting slices

2023-03-05 Thread Greg Ewing via Python-list

On 6/03/23 11:43 am, Stefan Ram wrote:

   A user tries to chop of sections from a string,
   but does not use "split" because the separator might become
   more complicated so that a regular expression will be required
   to find it.


What's wrong with re.split() in that case?

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-05 Thread Thomas Passin

On 3/4/2023 11:12 PM, Dino wrote:

On 3/4/2023 10:43 PM, Dino wrote:


I need fast text-search on a large (not huge, let's say 30k records 
totally) list of items. Here's a sample of my raw data (a list of US 
cars: model and make)


I suspect I am really close to answering my own question...

 >>> import time
 >>> lis = [str(a**2+a*3+a) for a in range(0,3)]
 >>> s = time.process_time_ns(); res = [el for el in lis if "13467" in 
el]; print(time.process_time_ns() -s);

753800
 >>> s = time.process_time_ns(); res = [el for el in lis if "52356" in 
el]; print(time.process_time_ns() -s);

1068300
 >>> s = time.process_time_ns(); res = [el for el in lis if "5256" in 
el]; print(time.process_time_ns() -s);

862000
 >>> s = time.process_time_ns(); res = [el for el in lis if "6" in el]; 
print(time.process_time_ns() -s);

1447300
 >>> s = time.process_time_ns(); res = [el for el in lis if "1" in el]; 
print(time.process_time_ns() -s);

1511100
 >>> s = time.process_time_ns(); res = [el for el in lis if "13467" in 
el]; print(time.process_time_ns() -s); print(len(res), res[:10])

926900
2 ['134676021', '313467021']
 >>>

I can do a substring search in a list of 30k elements in less than 2ms 
with Python. Is my reasoning sound?


I would probably ingest the data at startup into a dictionary - or 
perhaps several depending on your access patterns - and then you will 
only need to to a fast lookup in one or more dictionaries.


If your access pattern would be easier with SQL queries, load the data 
into an SQLite database on startup.


IOW, do the bulk of the work once at startup.
--
https://mail.python.org/mailman/listinfo/python-list


Re: Cutting slices

2023-03-05 Thread MRAB

On 2023-03-06 00:28, dn via Python-list wrote:

On 06/03/2023 11.59, aapost wrote:

On 3/5/23 17:43, Stefan Ram wrote:

   The following behaviour of Python strikes me as being a bit
   "irregular". A user tries to chop of sections from a string,
   but does not use "split" because the separator might become
   more complicated so that a regular expression will be required
   to find it. But for now, let's use a simple "find":
|>>> s = 'alpha.beta.gamma'
|>>> s[ 0: s.find( '.', 0 )]
|'alpha'
|>>> s[ 6: s.find( '.', 6 )]
|'beta'
|>>> s[ 11: s.find( '.', 11 )]
|'gamm'
|>>>

   . The user always inserted the position of the previous find plus
   one to start the next "find", so he uses "0", "6", and "11".
   But the "a" is missing from the final "gamma"!
   And it seems that there is no numerical value at all that
   one can use for "n" in "string[ 0: n ]" to get the whole
   string, isn't it?




I would agree with 1st part of the comment.

Just noting that string[11:], string[11:None], as well as string[11:16] 
work ... as well as string[11:324242]... lol..


To expand on the above, answering the OP's second question: the numeric
value is len( s ).

If the repetitive process is required, try a loop like:

  >>> start_index = 11 #to cure the issue-raised

  >>> try:
... s[ start_index:s.index( '.', start_index ) ]
... except ValueError:
... s[ start_index:len( s ) ]
...
'gamma'


Somewhat off-topic, but...

When there was a discussion about a None-coalescing operator, I thought 
that it would've been nice if .find and .rfind returned None instead of -1.


There have been times when I've wanted to find the next space (or 
whatever) and have it return the length of the string if absent. That 
could've been accomplished with:


s.find(' ', pos) ?? len(s)

Other times I've wanted it to return -1. That could've been accomplished 
with:


s.find(' ', pos) ?? -1

(There's a place in the re module where .rfind returning -1 is just the 
right value.)


In this instance, slicing with None as the end is just what's wanted.

Ah, well...
--
https://mail.python.org/mailman/listinfo/python-list


Re: Bug 3.11.x behavioral, open file buffers not flushed til file closed.

2023-03-05 Thread Chris Angelico
On Mon, 6 Mar 2023 at 12:41, Greg Ewing via Python-list
 wrote:
>
> On 6/03/23 1:02 pm, Cameron Simpson wrote:
> > Also, fsync() need not expedite the data getting to disc. It is equally
> > valid that it just blocks your programme _until_ the data have gone to
> > disc.
>
> Or until it *thinks* the data has gone to the disk. Some drives
> do buffering of their own, which may impose additional delays
> before the data actually gets written.
>

Sadly true. Usually with SSDs. Unfortunately, at that point, there's
nothing ANYONE can do about it, since the OS is deceived as much as
anyone else.

But Cameron is completely right in that fsync's primary job is "block
until" rather than "do this sooner". Adding fsync calls might possibly
cause a flush when one otherwise wouldn't have happened, but generally
they'll slow things down in the interests of reliability.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bug 3.11.x behavioral, open file buffers not flushed til file closed.

2023-03-05 Thread Greg Ewing via Python-list

On 6/03/23 1:02 pm, Cameron Simpson wrote:
Also, fsync() need not expedite the data getting to disc. It is equally 
valid that it just blocks your programme _until_ the data have gone to 
disc.


Or until it *thinks* the data has gone to the disk. Some drives
do buffering of their own, which may impose additional delays
before the data actually gets written.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: Bug 3.11.x behavioral, open file buffers not flushed til file closed.

2023-03-05 Thread Cameron Simpson

On 05Mar2023 10:38, aapost  wrote:

Additionally (not sure if this still applies):
flush() does not necessarily write the file’s data to disk. Use flush() 
followed by os.fsync() to ensure this behavior.


Yes. You almost _never_ need or want this behaviour. A database tends to 
fsync at the end of a transaction and at other critical points.


However, once you've `flush()`ed the file the data are then in the hands 
of the OS, to get to disc in a timely but efficient fashion. Calling 
fsync(), like calling flush(), affects writing _efficiency_ by depriving 
the OS (or for flush(), the Python I/O buffering system) the opportunity 
to bundle further data efficiency. It will degrade the overall 
performance.


Also, fsync() need not expedite the data getting to disc. It is equally 
valid that it just blocks your programme _until_ the data have gone to 
disc. I practice it probably does expedite things slightly, but the real 
world effect is that your pogramme will gratuitously block anyway, when 
it could just get on with its work, secure in the knowledge that the OS 
has its back.


flush() is for causality - ensuring the data are on their way so that 
some external party _will_ see them rather than waiting forever for data 
with are lurking in the buffer.  If that external party, for you, is an 
end user tailing a log file, then you might want to flush(0 at the end 
of every line.  Note that there is a presupplied line-buffering mode you 
can choose which will cause a file to flush like that for you 
automatically.


So when you flush is a policy decision which you can make either during 
the programme flow or to a less flexible degree when you open the file.


As an example of choosing-to-flush, here's a little bit of code in a 
module I use for writing packet data to a stream (eg a TCP connection):

https://github.com/cameron-simpson/css/blob/00ab1a8a64453dc8a39578b901cfa8d1c75c3de2/lib/python/cs/packetstream.py#L624

Starting at line 640: `if Q.empty():` it optionally pauses briefly to 
see if more packets are coming on the source queue. If another arrives, 
the flush() is _skipped_, and the decision to flush made again after the 
next packet is transcribed. In this way a busy source of packets can 
write maximally efficient data (full buffers) as long as there's new 
data coming from the queue, but if the queue is empty and stays empty 
for more that `grace` seconds we flush anyway so that the receiver 
_will_ still see the latest packet.


Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list


Re: Cutting slices

2023-03-05 Thread Rob Cliffe via Python-list



On 05/03/2023 22:59, aapost wrote:

On 3/5/23 17:43, Stefan Ram wrote:

   The following behaviour of Python strikes me as being a bit
   "irregular". A user tries to chop of sections from a string,
   but does not use "split" because the separator might become
   more complicated so that a regular expression will be required
   to find it. But for now, let's use a simple "find":
   |>>> s = 'alpha.beta.gamma'
|>>> s[ 0: s.find( '.', 0 )]
|'alpha'
|>>> s[ 6: s.find( '.', 6 )]
|'beta'
|>>> s[ 11: s.find( '.', 11 )]
|'gamm'
|>>>

   . The user always inserted the position of the previous find plus
   one to start the next "find", so he uses "0", "6", and "11".
   But the "a" is missing from the final "gamma"!
      And it seems that there is no numerical value at all that
   one can use for "n" in "string[ 0: n ]" to get the whole
   string, isn't it?





The final `find` returns -1 because there is no separator after 'gamma'.
So you are asking for
    s[ 11 : -1]
which correctly returns 'gamm'.
You need to test for this condition.
Alternatively you could ensure that there is a final separator:
    s = 'alpha.beta.gamma.'
but you would still need to test when the string was exhausted.
Best wishes
Rob Cliffe
--
https://mail.python.org/mailman/listinfo/python-list


Re: Bug 3.11.x behavioral, open file buffers not flushed til file closed.

2023-03-05 Thread Eryk Sun
On 3/5/23, aapost  wrote:
>
> If a file is still open, even if all the operations on the file have
> ceased for a time, the tail of the written operation data does not get
> flushed to the file until close is issued and the file closes cleanly.

This is normal behavior for buffered file I/O. There's no timer set to
flush the buffer after operations have "ceased for a time". It
automatically flushes only when the buffer is full or, for line
buffering, when a newline is written.

The default buffer size is based on the raw file object's _blksize
attribute. If st_blksize can't be determined via fstat(), the default
_blksize is 8 KiB.

Here's an example on Linux. In this example, the buffer size is 4 KiB.

>>> f = open('abc', 'w')
>>> os.fstat(f.fileno()).st_blksize
4096
>>> f.buffer.raw._blksize
4096
>>> f.writelines(f'{i}\n' for i in range(5))
>>> with open('abc') as g: g.readlines()[-1]
...
'49626\n'

>>> pre_flush_size = os.path.getsize('abc')
>>> f.flush()
>>> post_flush_size = os.path.getsize('abc')
>>> post_flush_size - pre_flush_size
2238

Verify that this makes sense, based on what was left in the buffer
prior to flushing:

>>> remaining_lines = 5 - 49626 - 1
>>> bytes_per_line = 6
>>> remaining_lines * bytes_per_line
2238
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Cutting slices

2023-03-05 Thread dn via Python-list

On 06/03/2023 11.59, aapost wrote:

On 3/5/23 17:43, Stefan Ram wrote:

   The following behaviour of Python strikes me as being a bit
   "irregular". A user tries to chop of sections from a string,
   but does not use "split" because the separator might become
   more complicated so that a regular expression will be required
   to find it. But for now, let's use a simple "find":
|>>> s = 'alpha.beta.gamma'
|>>> s[ 0: s.find( '.', 0 )]
|'alpha'
|>>> s[ 6: s.find( '.', 6 )]
|'beta'
|>>> s[ 11: s.find( '.', 11 )]
|'gamm'
|>>>

   . The user always inserted the position of the previous find plus
   one to start the next "find", so he uses "0", "6", and "11".
   But the "a" is missing from the final "gamma"!
   And it seems that there is no numerical value at all that
   one can use for "n" in "string[ 0: n ]" to get the whole
   string, isn't it?




I would agree with 1st part of the comment.

Just noting that string[11:], string[11:None], as well as string[11:16] 
work ... as well as string[11:324242]... lol..


To expand on the above, answering the OP's second question: the numeric 
value is len( s ).


If the repetitive process is required, try a loop like:

>>> start_index = 11   #to cure the issue-raised

>>> try:
... s[ start_index:s.index( '.', start_index ) ]
... except ValueError:
... s[ start_index:len( s ) ]
...
'gamma'


However, if the objective is to split, then use the function built for 
the purpose:


>>> s.split( "." )
['alpha', 'beta', 'gamma']

(yes, the OP says this won't work - but doesn't show why)


If life must be more complicated, but the next separator can be 
predicted, then its close-relative is partition().
NB can use both split() and partition() on the sub-strings produced by 
an earlier split() or ... ie there may be no reason to work strictly 
from left to right
- can't really help with this because the information above only shows 
multiple "." characters, and not how multiple separators might be 
interpreted.



A straight-line approach might be to use maketrans() and translate() to 
convert all the separators to a single character, eg white-space, which 
can then be split using any of the previously-mentioned methods.



If the problem is sufficiently complicated and the OP is prepared to go 
whole-hog, then PSL's tokenize library or various parser libraries may 
be worth consideration...


--
Regards,
=dn
--
https://mail.python.org/mailman/listinfo/python-list


Re: Bug 3.11.x behavioral, open file buffers not flushed til file closed.

2023-03-05 Thread Cameron Simpson

On 05Mar2023 09:35, aapost  wrote:
I have run in to this a few times and finally reproduced it. Whether 
it is as expected I am not sure since it is slightly on the user, but 
I can think of scenarios where this would be undesirable behavior.. 
This occurs on 3.11.1 and 3.11.2 using debian 12 testing, in case the 
reasoning lingers somewhere else.


If a file is still open, even if all the operations on the file have 
ceased for a time, the tail of the written operation data does not get 
flushed to the file until close is issued and the file closes cleanly.


Yes, because files are _buffered_ by default. See the `buffering` 
parameter to the open() function in the docs.



2 methods to recreate - 1st run from interpreter directly:

f = open("abc", "w")
for i in range(5):
 f.write(str(i) + "\n")

you can cat the file and see it stops at 49626 until you issue an f.close()


Or until you issue an `f.flush()`. hich is what flush is for.
cat out the file and same thing, stops at 49626. a ctrl-c exit closes 
the files cleanly, but if the file exits uncleanly, i.e. a kill command 
or something else catastrophic. the remaining buffer is lost.


Yes, because of bfufering. This is normal and IMO correct. You can turn 
it off, or catch-and-flush these circumstances (SIGKILL excepted, 
because SIGKILL's entire purpose it to be uncatchable).


Of course one SHOULD manage the closing of their files and this is 
partially on the user, but if by design something is hanging on to a 
file while it is waiting for something, then a crash occurs, they lose 
a portion of what was assumed already complete...


f.flush()

Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list


Re: Cutting slices

2023-03-05 Thread aapost

On 3/5/23 17:43, Stefan Ram wrote:

   The following behaviour of Python strikes me as being a bit
   "irregular". A user tries to chop of sections from a string,
   but does not use "split" because the separator might become
   more complicated so that a regular expression will be required
   to find it. But for now, let's use a simple "find":
   
|>>> s = 'alpha.beta.gamma'

|>>> s[ 0: s.find( '.', 0 )]
|'alpha'
|>>> s[ 6: s.find( '.', 6 )]
|'beta'
|>>> s[ 11: s.find( '.', 11 )]
|'gamm'
|>>>

   . The user always inserted the position of the previous find plus
   one to start the next "find", so he uses "0", "6", and "11".
   But the "a" is missing from the final "gamma"!
   
   And it seems that there is no numerical value at all that

   one can use for "n" in "string[ 0: n ]" to get the whole
   string, isn't it?




I would agree with 1st part of the comment.

Just noting that string[11:], string[11:None], as well as string[11:16] 
work ... as well as string[11:324242]... lol..

--
https://mail.python.org/mailman/listinfo/python-list


Re: Bug 3.11.x behavioral, open file buffers not flushed til file closed.

2023-03-05 Thread Frank B

Am 05.03.23 um 15:35 schrieb aapost:
I have run in to this a few times and finally reproduced it. Whether it 
is as expected I am not sure since it is slightly on the user, but I can 
think of scenarios where this would be undesirable behavior.. This 
occurs on 3.11.1 and 3.11.2 using debian 12 testing, in case the 
reasoning lingers somewhere else.


If a file is still open, even if all the operations on the file have 
ceased for a time, the tail of the written operation data does not get 
flushed to the file until close is issued and the file closes cleanly.


2 methods to recreate - 1st run from interpreter directly:

f = open("abc", "w")
for i in range(5):
   f.write(str(i) + "\n")


use

with open("abc", "w") as f:
for i in range(5):
f.write(str(i) + "\n")

and all is well

Frank
--
https://mail.python.org/mailman/listinfo/python-list


Re: Bug 3.11.x behavioral, open file buffers not flushed til file closed.

2023-03-05 Thread aapost

On 3/5/23 09:35, aapost wrote:




Guess it could just be an annoying gotcha thing on me.

calling at least

f.flush()

in any cases where an explicit close is delayed would be the solution.

Additionally (not sure if this still applies):
flush() does not necessarily write the file’s data to disk. Use flush() 
followed by os.fsync() to ensure this behavior.

--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-05 Thread Dino



Here's the complete data file should anyone care.

Acura,CL
Acura,ILX
Acura,Integra
Acura,Legend
Acura,MDX
Acura,MDX Sport Hybrid
Acura,NSX
Acura,RDX
Acura,RL
Acura,RLX
Acura,RLX Sport Hybrid
Acura,RSX
Acura,SLX
Acura,TL
Acura,TLX
Acura,TSX
Acura,Vigor
Acura,ZDX
Alfa Romeo,164
Alfa Romeo,4C
Alfa Romeo,4C Spider
Alfa Romeo,Giulia
Alfa Romeo,Spider
Alfa Romeo,Stelvio
Alfa Romeo,Tonale
Aston Martin,DB11
Aston Martin,DB9
Aston Martin,DB9 GT
Aston Martin,DBS
Aston Martin,DBS Superleggera
Aston Martin,DBX
Aston Martin,Rapide
Aston Martin,Rapide S
Aston Martin,Vanquish
Aston Martin,Vanquish S
Aston Martin,Vantage
Aston Martin,Virage
Audi,100
Audi,80
Audi,90
Audi,A3
Audi,A3 Sportback e-tron
Audi,A4
Audi,A4 (2005.5)
Audi,A4 allroad
Audi,A5
Audi,A5 Sport
Audi,A6
Audi,A6 allroad
Audi,A7
Audi,A8
Audi,Cabriolet
Audi,Q3
Audi,Q4 Sportback e-tron
Audi,Q4 e-tron
Audi,Q5
Audi,Q5 Sportback
Audi,Q7
Audi,Q8
Audi,Quattro
Audi,R8
Audi,RS 3
Audi,RS 4
Audi,RS 5
Audi,RS 6
Audi,RS 7
Audi,RS Q8
Audi,RS e-tron GT
Audi,S3
Audi,S4
Audi,S4 (2005.5)
Audi,S5
Audi,S6
Audi,S7
Audi,S8
Audi,SQ5
Audi,SQ5 Sportback
Audi,SQ7
Audi,SQ8
Audi,TT
Audi,allroad
Audi,e-tron
Audi,e-tron GT
Audi,e-tron S
Audi,e-tron S Sportback
Audi,e-tron Sportback
BMW,1 Series
BMW,2 Series
BMW,3 Series
BMW,4 Series
BMW,5 Series
BMW,6 Series
BMW,7 Series
BMW,8 Series
BMW,Alpina B7
BMW,M
BMW,M2
BMW,M3
BMW,M4
BMW,M5
BMW,M6
BMW,M8
BMW,X1
BMW,X2
BMW,X3
BMW,X3 M
BMW,X4
BMW,X4 M
BMW,X5
BMW,X5 M
BMW,X6
BMW,X6 M
BMW,X7
BMW,Z3
BMW,Z4
BMW,Z4 M
BMW,Z8
BMW,i3
BMW,i4
BMW,i7
BMW,i8
BMW,iX
Bentley,Arnage
Bentley,Azure
Bentley,Azure T
Bentley,Bentayga
Bentley,Brooklands
Bentley,Continental
Bentley,Continental GT
Bentley,Flying Spur
Bentley,Mulsanne
Buick,Cascada
Buick,Century
Buick,Enclave
Buick,Encore
Buick,Encore GX
Buick,Envision
Buick,LaCrosse
Buick,LeSabre
Buick,Lucerne
Buick,Park Avenue
Buick,Rainier
Buick,Regal
Buick,Regal Sportback
Buick,Regal TourX
Buick,Rendezvous
Buick,Riviera
Buick,Roadmaster
Buick,Skylark
Buick,Terraza
Buick,Verano
Cadillac,ATS
Cadillac,ATS-V
Cadillac,Allante
Cadillac,Brougham
Cadillac,CT4
Cadillac,CT5
Cadillac,CT6
Cadillac,CT6-V
Cadillac,CTS
Cadillac,CTS-V
Cadillac,Catera
Cadillac,DTS
Cadillac,DeVille
Cadillac,ELR
Cadillac,Eldorado
Cadillac,Escalade
Cadillac,Escalade ESV
Cadillac,Escalade EXT
Cadillac,Fleetwood
Cadillac,LYRIQ
Cadillac,SRX
Cadillac,STS
Cadillac,Seville
Cadillac,Sixty Special
Cadillac,XLR
Cadillac,XT4
Cadillac,XT5
Cadillac,XT6
Cadillac,XTS
Chevrolet,1500 Extended Cab
Chevrolet,1500 Regular Cab
Chevrolet,2500 Crew Cab
Chevrolet,2500 Extended Cab
Chevrolet,2500 HD Extended Cab
Chevrolet,2500 HD Regular Cab
Chevrolet,2500 Regular Cab
Chevrolet,3500 Crew Cab
Chevrolet,3500 Extended Cab
Chevrolet,3500 HD Extended Cab
Chevrolet,3500 HD Regular Cab
Chevrolet,3500 Regular Cab
Chevrolet,APV Cargo
Chevrolet,Astro Cargo
Chevrolet,Astro Passenger
Chevrolet,Avalanche
Chevrolet,Avalanche 1500
Chevrolet,Avalanche 2500
Chevrolet,Aveo
Chevrolet,Beretta
Chevrolet,Blazer
Chevrolet,Blazer EV
Chevrolet,Bolt EUV
Chevrolet,Bolt EV
Chevrolet,Camaro
Chevrolet,Caprice
Chevrolet,Caprice Classic
Chevrolet,Captiva Sport
Chevrolet,Cavalier
Chevrolet,City Express
Chevrolet,Classic
Chevrolet,Cobalt
Chevrolet,Colorado Crew Cab
Chevrolet,Colorado Extended Cab
Chevrolet,Colorado Regular Cab
Chevrolet,Corsica
Chevrolet,Corvette
Chevrolet,Cruze
Chevrolet,Cruze Limited
Chevrolet,Equinox
Chevrolet,Equinox EV
Chevrolet,Express 1500 Cargo
Chevrolet,Express 1500 Passenger
Chevrolet,Express 2500 Cargo
Chevrolet,Express 2500 Passenger
Chevrolet,Express 3500 Cargo
Chevrolet,Express 3500 Passenger
Chevrolet,G-Series 1500
Chevrolet,G-Series 2500
Chevrolet,G-Series 3500
Chevrolet,G-Series G10
Chevrolet,G-Series G20
Chevrolet,G-Series G30
Chevrolet,HHR
Chevrolet,Impala
Chevrolet,Impala Limited
Chevrolet,Lumina
Chevrolet,Lumina APV
Chevrolet,Lumina Cargo
Chevrolet,Lumina Passenger
Chevrolet,Malibu
Chevrolet,Malibu (Classic)
Chevrolet,Malibu Limited
Chevrolet,Metro
Chevrolet,Monte Carlo
Chevrolet,Prizm
Chevrolet,S10 Blazer
Chevrolet,S10 Crew Cab
Chevrolet,S10 Extended Cab
Chevrolet,S10 Regular Cab
Chevrolet,SS
Chevrolet,SSR
Chevrolet,Silverado (Classic) 1500 Crew Cab
Chevrolet,Silverado (Classic) 1500 Extended Cab
Chevrolet,Silverado (Classic) 1500 HD Crew Cab
Chevrolet,Silverado (Classic) 1500 Regular Cab
Chevrolet,Silverado (Classic) 2500 HD Crew Cab
Chevrolet,Silverado (Classic) 2500 HD Extended Cab
Chevrolet,Silverado (Classic) 2500 HD Regular Cab
Chevrolet,Silverado (Classic) 3500 Crew Cab
Chevrolet,Silverado (Classic) 3500 Extended Cab
Chevrolet,Silverado (Classic) 3500 Regular Cab
Chevrolet,Silverado 1500 Crew Cab
Chevrolet,Silverado 1500 Double Cab
Chevrolet,Silverado 1500 Extended Cab
Chevrolet,Silverado 1500 HD Crew Cab
Chevrolet,Silverado 1500 LD Double Cab
Chevrolet,Silverado 1500 Limited Crew Cab
Chevrolet,Silverado 1500 Limited Double Cab
Chevrolet,Silverado 1500 Limited Regular Cab
Chevrolet,Silverado 1500 Regular Cab
Chevrolet,Silverado 2500 Crew Cab
Chevrolet,Silverado 

Bug 3.11.x behavioral, open file buffers not flushed til file closed.

2023-03-05 Thread aapost
I have run in to this a few times and finally reproduced it. Whether it 
is as expected I am not sure since it is slightly on the user, but I can 
think of scenarios where this would be undesirable behavior.. This 
occurs on 3.11.1 and 3.11.2 using debian 12 testing, in case the 
reasoning lingers somewhere else.


If a file is still open, even if all the operations on the file have 
ceased for a time, the tail of the written operation data does not get 
flushed to the file until close is issued and the file closes cleanly.


2 methods to recreate - 1st run from interpreter directly:

f = open("abc", "w")
for i in range(5):
  f.write(str(i) + "\n")

you can cat the file and see it stops at 49626 until you issue an f.close()

a script to recreate:

f = open("abc", "w")
for i in range(5):
  f.write(str(i) + "\n")
while(1):
  pass

cat out the file and same thing, stops at 49626. a ctrl-c exit closes 
the files cleanly, but if the file exits uncleanly, i.e. a kill command 
or something else catastrophic. the remaining buffer is lost.


Of course one SHOULD manage the closing of their files and this is 
partially on the user, but if by design something is hanging on to a 
file while it is waiting for something, then a crash occurs, they lose a 
portion of what was assumed already complete...

--
https://mail.python.org/mailman/listinfo/python-list


Fast full-text searching in Python (job for Whoosh?)

2023-03-05 Thread Dino



I need fast text-search on a large (not huge, let's say 30k records 
totally) list of items. Here's a sample of my raw data (a list of US 
cars: model and make)


$ head all_cars_unique.csv\
Acura,CL
Acura,ILX
Acura,Integra
Acura,Legend
Acura,MDX
Acura,MDX Sport Hybrid
Acura,NSX
Acura,RDX
Acura,RL
Acura,RLX
$ wc -l all_cars_unique.csv
1415 all_cars_unique.csv
$ grep -i v60 all_cars_unique.csv
Genesis,GV60
Volvo,V60
$

Essentially, I want my input field to suggest autofill options with data 
from this file/list. The user types "v60" and a REST point will offer:


[
 {"model":"GV60", "manufacturer":"Genesis"},
 {"model":"V60", "manufacturer":"Volvo"}
]

i.e. a JSON response that I can use to generate the autofill with 
JavaScript. My Back-End is Python (Flask).


How can I implement this? A library called Whoosh seems very promising 
(albeit it's so feature-rich that it's almost like shooting a fly with a 
bazooka in my case), but I see two problems:


 1) Whoosh is either abandoned or the project is a mess in terms of 
community and support (https://groups.google.com/g/whoosh/c/QM_P8cGi4v4 
) and


 2) Whoosh seems to be a Python only thing, which is great for now, but 
I wouldn't want this to become an obstacle should I need port it to a 
different language at some point.


are there other options that are fast out there? Can I "grep" through a 
data structure in python... but faster?


Thanks

Dino

--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-05 Thread Dino

On 3/4/2023 10:43 PM, Dino wrote:


I need fast text-search on a large (not huge, let's say 30k records 
totally) list of items. Here's a sample of my raw data (a list of US 
cars: model and make)


I suspect I am really close to answering my own question...

>>> import time
>>> lis = [str(a**2+a*3+a) for a in range(0,3)]
>>> s = time.process_time_ns(); res = [el for el in lis if "13467" in 
el]; print(time.process_time_ns() -s);

753800
>>> s = time.process_time_ns(); res = [el for el in lis if "52356" in 
el]; print(time.process_time_ns() -s);

1068300
>>> s = time.process_time_ns(); res = [el for el in lis if "5256" in 
el]; print(time.process_time_ns() -s);

862000
>>> s = time.process_time_ns(); res = [el for el in lis if "6" in el]; 
print(time.process_time_ns() -s);

1447300
>>> s = time.process_time_ns(); res = [el for el in lis if "1" in el]; 
print(time.process_time_ns() -s);

1511100
>>> s = time.process_time_ns(); res = [el for el in lis if "13467" in 
el]; print(time.process_time_ns() -s); print(len(res), res[:10])

926900
2 ['134676021', '313467021']
>>>

I can do a substring search in a list of 30k elements in less than 2ms 
with Python. Is my reasoning sound?


Dino


--
https://mail.python.org/mailman/listinfo/python-list


[Python-announce] SCons 4.5.0 Released

2023-03-05 Thread Bill Deegan
A new SCons release, 4.5.0, is now available on the SCons download page:

https://scons.org/pages/download.html


Here is a summary of the changes since 4.4.0:

NOTE: If you build with Python 3.10.0 and then rebuild with 3.10.1 (or
higher), you may
  see unexpected rebuilds. This is due to Python internals changing
which changed
  the signature of a Python Action Function.

NOTE: If you use a dictionary to specify your CPPDEFINES, you may see an
unexpected rebuild.
  The insertion order of dictionary keys is now preserved when
generating the command line.
  Previously these were sorted alphabecially.  This change to the
command line,
  while generating identical set of CPPDEFINES can change order and
cause a rebuild.


NEW FUNCTIONALITY
-

- Added ValidateOptions() which will check that all command line options
are in either
  those specified by SCons itself, or by AddOption() in
SConstruct/SConscript.  It should
  not be called until all AddOption() calls are completed. Resolves Issue
#4187
- Added --experimental=tm_v2, which enables Andrew Morrow's NewParallel Job
implementation.
  This should scale much better for highly parallel builds.  You can also
enable this via SetOption().
- Added FILE_ENCODING, to allow explicitly setting the text encoding for
files
  written by the Textfile() and Substfile() builders. If not specified,
Textfile() and Substfile() builders
  will write files as UTF-8.


DEPRECATED FUNCTIONALITY


- The qt tool has been renamed qt3.

CHANGED/ENHANCED EXISTING FUNCTIONALITY
---

- Added -fsanitize support to ParseFlags().  This will propagate to CCFLAGS
and LINKFLAGS.
- Calling EnsureSConsVersion() and EnsurePythonVersion() won't initialize
  DefaultEnvironment anymore.
- The console message from the Chmod() action function now displays
  octal modes using the modern Python syntax (0o755 rather than 0755).
- Migrated logging logic for --taskmastertrace to use Python's logging
module. Added logging
  to NewParallel Job class (Andrew Morrow's new parallel job implementation)
- Preliminary support for Python 3.12.
- Run LaTeX after biber/bibtex only if necessary
- Configure context methods CheckLib and CheckLibWithHeader now expose
  two additional keyword arguments: 'append', which controls whether to
append
  (the default) or prepend discovered libraries to $LIBS, and 'unique',
  which controls whether to add the library if it is already in the $LIBS
  list. This brings the library-adding functionality in Configure in line
  with the regular Append, AppendUnique, Prepend and PrependUnique methods.
- CPPDEFINES values added via a dictionary type are longer sorted by
  key. This used to be required to maintain a consistent order of
  commandline arguments between SCons runs, but meant macros were not
  always emitted in the order entered. Sorting is no longer required
  after Python interpreter improvements.  There might be a one-time
  rebuild of targets that involved such sorted keys in their actions.
- Renamed the 'qt' tools to 'qt3' since the logic in that tool is only for
QT version 3.
  Renamed all env vars which affect qt3 from QT_ to QT3_.  If you are still
using SCons
  to build QT 3 code, you'll need to update your SConscripts.  Note that
using 'qt' tool
  has been deprecated for some time.


FIXES
-

- Added missing newline to generated compilation database
(compile_commands.json)
- A list argument as the source to the Copy() action function is now
handled.
  Both the implementation and the strfunction which prints the progress
  message were adjusted.
- The Java Scanner processing of JAVACLASSPATH for dependencies (behavior
  that was introduced in SCons 4.4.0) is adjusted to split on the system's
  search path separator instead of on a space. The previous behavior meant
  that a path containing spaces (e.g. r"C:\somepath\My Classes") would
  lead to unexpected errors. If the split-on-space behavior is desired,
  pre-split the value: instead of: env["JAVACLASSPATH"] = "foo bar baz"
  use: env["JAVACLASSPATH"] = env.Split("foo bar baz")
  There is no change in how JAVACLASSPATH gets turned into the -classpath
  argument passed to the JDK tools.
- Ninja: Fix execution environment sanitation for launching ninja.
Previously if you set an
  execution environment variable set to a python list it would crash. Now it
  will create a string joining the list with os.pathsep
- Fixed command line argument --diskcheck: previously a value of 'none' was
ignored.
  SetOption('diskcheck','none') is unaffected, as it did not have the
problem.
- Fixed Issue #4275 - when outputting compilation db and TEMPFILE was in
use, the compilation db would have
  command lines using the generated tempfile for long command lines,
instead of the full command line for
  the compilation step for the source/target pair.
- A refactor in the caching logic for version 4.4.0 left Java inner classes
  failing 

Re: Testing list sequence question -- thanks for the info

2023-03-05 Thread Grant Edwards
On 2023-03-05, Gabor Urban  wrote:

> Upgrading our Python to 3.7 seems to be out of question at the moment.

Using an OrderedDict doesn't work for you?

--
Grant
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Cryptic software announcements (was: ANN: DIPY 1.6.0)

2023-03-05 Thread Mats Wichmann

On 3/1/23 04:57, Rob Cliffe via Python-list wrote:


I think it would be a good idea if software announcements would include
a single paragraph (or maybe just a single sentence) summarizing what
the software is and does.

 hp



+1
Rob Cliffe


Excellent adivce - and many of the announcements on the separate 
python-announce list do actually follow this model (and this is probably 
actually the right place to send announcements).


I'd even extend the suggestion further - it's surprising how many 
newcomers ask questions about a particular package that many people have 
not heard of - it all seems so obvious when you have an assignment, but 
when there are over 400,000 packages on PyPI it should actually not be 
surprising that a few of us have not actually heard of all of them :-) 
Maybe give a bit more context...


--
https://mail.python.org/mailman/listinfo/python-list


Re: Cryptic software announcements (was: ANN: DIPY 1.6.0)

2023-03-05 Thread Rob Cliffe via Python-list




On 01/03/2023 00:13, Peter J. Holzer wrote:

[This isn't specifically about DIPY, I've noticed the same thing in
other announcements]

On 2023-02-28 13:48:56 -0500, Eleftherios Garyfallidis wrote:

Hello all,


We are excited to announce a new release of DIPY: DIPY 1.6.0 is out from
the oven!

That's nice, but what is DIPY?



In addition, registration for the oceanic DIPY workshop 2023 (April 24-28)
is now open! Our comprehensive program is designed to equip you with the
skills and knowledge needed to master the latest techniques and tools in
structural and diffusion imaging.

Ok, so since the workshop is about ".., tools in structural and
diffusion imaging", DIPY is probably such a tool.

However, without this incidental announcement I wouldn't have any idea
what it is or if it would be worth my time clicking at any of the links.


I think it would be a good idea if software announcements would include
a single paragraph (or maybe just a single sentence) summarizing what
the software is and does.

 hp



+1
Rob Cliffe
--
https://mail.python.org/mailman/listinfo/python-list


[Python-announce] Leo 6.7.2 released

2023-03-05 Thread Edward K. Ream
Leo https://leo-editor.github.io/leo-editor/ 6.7.2 is now available on
GitHub  and pypi
.

Leo is an IDE, outliner and PIM
.

*The highlights of Leo 6.7.2*

   - PR #3019: Leo's website has moved to GitHub Pages:
   https://leo-editor.github.io/leo-editor/

Commands:

   - PR #3031: Add check-nodes command. It helps keep @clean files in sync.
   - PR #3056: Leo's beautify command is now PEP8 compliant.
   - PR #3140: Run pylint on node.
   - PR #3166: Add the execute-external-file command.

Settings and features:

   - PR #2979: Add @bool run-flake8-on-write setting.
   - PR #2983: Add --black-sentinels command-line option.
   - PR #3038: Add @string rst3-action setting.
   - PR #3053: Add @string gxn-kind setting: Support gnxs formatted as
   UUIDs.
   - PR #3132: Add @bool rst3-remove-leo-directives setting.

Other changes:

   - 80+ issues and 100+ pull requests.

*Links*

   - Download Leo 
   - Install Leo 
   - 6.7.2 Issues
   

   - 6.7.2 Pull Requests
   
   - Documentation 
   - Tutorials 
   - Video tutorials
   
   - Forum 
   - Download 
   - Leo on GitHub 
   - LeoVue 
   - What people are saying about Leo
   
   - A web page that displays .leo files
   
   - More links 

-
Edward K. Ream: edream...@gmail.com
Leo Editor: https://leo-editor.github.io/leo-editor/ 
-
___
Python-announce-list mailing list -- python-announce-list@python.org
To unsubscribe send an email to python-announce-list-le...@python.org
https://mail.python.org/mailman3/lists/python-announce-list.python.org/
Member address: arch...@mail-archive.com


Testing list sequence question -- thanks for the info

2023-03-05 Thread Gabor Urban
Hi guys,

Thank you very much for the accurate answer.

Upgrading our Python to 3.7 seems to be out of question at the moment. I
will check the requirement specification if the order of the keys is
important at all.
It could be a wish only.

-- 
Urbán Gábor

Linux is like a wigwam: no Gates, no Windows and an Apache inside.
-- 
https://mail.python.org/mailman/listinfo/python-list