[issue25898] Check for subsequence inside a sequence

2019-08-22 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

> If I were trying to channel Raymond I'd suggest posting the 
> python as a recipe and see if there is uptake.  However, I
>  could be wrong and he might be interested.  (I can't say 
> that I've ever needed this check myself.)

Yes, exactly :-)

Sebastian, thanks for the suggestion.

I am going to decline this for collections and itertools based on the relative 
infrequency of need (for us, it usually only comes up in the context of 
strings).

FWIW, there may be algorithmically more advanced techniques that might be worth 
looking into: knuth-morris-pratt, rabin-karp, etc.

Please do post this to the Python Package Index or someother place where people 
can get to it if the need arises.

--
resolution:  -> rejected
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25898] Check for subsequence inside a sequence

2015-12-20 Thread Sebastian Linke

Sebastian Linke added the comment:

Slightly modified the implementation to return the index rather than a boolean.

--
Added file: http://bugs.python.org/file41374/subseq_index.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25898] Check for subsequence inside a sequence

2015-12-18 Thread Ethan Furman

Ethan Furman added the comment:

+1 for sequences
+1 for subsequence_index instead of has_subsequence
+1 for returning None when not found  ;)

--
nosy: +ethan.furman

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25898] Check for subsequence inside a sequence

2015-12-18 Thread R. David Murray

R. David Murray added the comment:

If I were trying to channel Raymond I'd suggest posting the python as a recipe 
and see if there is uptake.  However, I could be wrong and he might be 
interested.  (I can't say that I've ever needed this check myself.)

I'd be inclined to agree that it should only work on sequences and be in 
collections.  If you can come up with a real use case where you wouldn't need 
the realized haystack afterwards I'd be interested to hear it :)  Regardless, I 
don't see how it could be composed with other itertools, so it doesn't seem to 
belong there.

As long as one is finding the needle, one might as well return the index, 
though.  So subsequence_index instead of has_subsequence.  And then you'd want 
an optional start index, and maybe an rsubsequence_index :)  So, it could be a 
method of Sequence, but is it useful enough to be worth adding?  Do any other 
languages have such a function as part of their built in toolkit?

(Aside: you can avoid creating a subsequence by using a loop.  You'd have to 
test with a bunch of variant sizes to see which is more efficient, but I'm 
guessing it would be the loop.)

--
nosy: +r.david.murray

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25898] Check for subsequence inside a sequence

2015-12-18 Thread Sebastian Linke

Sebastian Linke added the comment:

@Josh
What is the point of using iterators here? If the user wants an iterator to be 
checked then he can throw that iterator right away because it was exhausted by 
the function. I think that in most cases the user wants to do some 
postprocessing on a hairstack after being checked. This is why I restricted the 
arguments to be sequences.

Keeping that restriction in mind a pure Python implementation could look like 
this:

def has_subsequence(seq, subseq):
if not subseq:
return True
if len(subseq) > len(seq):
return False
start = 0
while True:
try:
start = seq.index(subseq[0], start)
except ValueError:
# seq.index() did not match
return False
stop = start + len(subseq)
if seq[stop - 1] == subseq[-1] and seq[start:stop] == subseq:
return True
start += 1

Unfortunately, this version potentially creates lots of subseqences for 
checking equality with the `subseq` argument. I tried to minimise the 
occurrence of those cases by pre-checking the first and the last element of a 
candidate. Anyway, this seems to be faster for small needles compared to your 
`deque`-based suggestion.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25898] Check for subsequence inside a sequence

2015-12-17 Thread Josh Rosenberg

Josh Rosenberg added the comment:

Aho Corasick doesn't seem likely to be useful here; it's good if the haystack 
is huge (or you have many haystacks to search) and you have many needles to 
look for (and the needles never change), but it pays a fairly steep setup cost; 
for a utility that searches for a single subsequence once, with no history, 
Aho-Corasick wouldn't help much.

A variant on Boyer-Moore (which involves less preprocessing work on the needle) 
might help, but I'm not sure the feature is compelling enough to warrant 
acceleration in the first place.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25898] Check for subsequence inside a sequence

2015-12-17 Thread Josh Rosenberg

Josh Rosenberg added the comment:

A utility like this seems like it would belong in `itertools`, not 
`collections`. It should also ideally avoid fully realizing the sequence so it 
could work with iterators/generators as well; PySequence_Fast will force 
creation of a `list`/`tuple` of the whole sequence when in practice, a `deque` 
with a maxlen could be used to only maintain the necessary window into the 
"haystack".

It would also help to have a pure Python implementation (and until you have 
one, it's probably overkill to write the C accelerator) for other Python 
distributions, and to serve as a baseline for comparison to see if a C 
accelerator is justified.  Something like this might be a decent point of 
comparison:

def has_subsequence(it, searchseq, *, all=all, map=map, eq=operator.eq):
searchseq = tuple(searchseq)
if not searchseq:
return True  # Empty sequence in everything
window = collections.deque(itertools.islice(it, len(searchseq)-1), 
len(searchseq))
for x in it:
window.append(x)
if all(map(eq, window, searchseq)):
return True
return False

--
nosy: +josh.r

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25898] Check for subsequence inside a sequence

2015-12-17 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
assignee:  -> rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25898] Check for subsequence inside a sequence

2015-12-17 Thread Марк Коренберг

Марк Коренберг added the comment:

Really optimal search algorithm should be something like that:

https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm

--
nosy: +mmarkk

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25898] Check for subsequence inside a sequence

2015-12-17 Thread Emanuel Barry

Emanuel Barry added the comment:

Reviewed the Python code (unlike what my email said, it doesn't LGTM; I'm tired 
and just forgot). I checked the C code for the obvious pitfalls (didn't spot 
any), but it will require someone else better than me to look at it.

--
nosy: +ebarry
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25898] Check for subsequence inside a sequence

2015-12-17 Thread Ethan Furman

Changes by Ethan Furman :


--
nosy: +rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25898] Check for subsequence inside a sequence

2015-12-17 Thread Sebastian Linke

New submission from Sebastian Linke:

With the attached patch I propose to add a new function to the "collections" 
module. It is meant to be used for determining whether a given subsequence is 
part of a particular sequence (with respect to the ordering of the 
subsequence). 

Doing this in pure Python (using the CPython interpreter) is relatively slow. 
So I did the implementation in C.

--
components: Library (Lib)
files: has_subsequence.patch
keywords: patch
messages: 256633
nosy: seblin
priority: normal
severity: normal
status: open
title: Check for subsequence inside a sequence
type: enhancement
versions: Python 3.6
Added file: http://bugs.python.org/file41342/has_subsequence.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com