Yes, opening a PR would be a good idea. It will be easier to discuss these 
ideas on a PR.

Aaron Meurer

On Sunday, March 9, 2025 at 12:56:46 AM UTC-7 [email protected] wrote:

> So my next steps should be:
> - Trying to test other aspects of factorint() the one mentioned above.
> - Learning and using strategies for generating "interesting" integers in 
> case of factorint()
> - Run the hypothesis in verbose mode for more information on generated 
> values.
>
> Should I open a PR for hypothesis testing of factorint()? In that way, we 
> can track progress.
> I also discovered another function that can be tested: digits() 
> <https://github.com/sympy/sympy/blob/b836671fe8459e9301c620117b660c6c8ca20264/sympy/ntheory/digits.py#L7>.
>  
> A simple example: digits(2345, 34) == [34, 2, 0, 33] can be easily tested 
> by generating N,n (N is the number and n is the base), then calculating 
> accordingly to check the assertion. This can also benefit from hypothesis 
> IMO. Let me know what you think. 
>
> On Sun, 9 Mar 2025 at 02:50, Aaron Meurer <[email protected]> wrote:
>
>> Yes, factorint is a better example of something that can be tested
>> with hypothesis. It's the example I gave on the issue
>> https://github.com/sympy/sympy/issues/20914.
>>
>> It's also a good example of how we can start with something simple and
>> built out a more rigorous test.
>>
>> There's other properties that could be added to the test as well, for 
>> instance
>>
>> assert isprime(prime)
>> assert exp >= 1
>> assert isinstance(prime, int)
>> assert isinstance(exp, int)
>>
>> And we can also test the various flags to factorint.
>>
>> As for the existing test, for now, we should generally leave any
>> existing manual tests intact. Hypothesis should be treated as an
>> extension to manual testing, not a complete replacement. For instance,
>> some of the assertions in that test you showed are based on specific
>> inputs that are known to potentially cause issues. Hypothesis might
>> not necessarily generate an example like them. Plus, you'll notice
>> that that test is marked as @slow, meaning some of the numbers being
>> tested are too slow compared to the inputs we might want to generate
>> from hypothesis.
>>
>> This is actually one thing that will need to be considered in this
>> project. Hypothesis tries to always generate "interesting" examples in
>> its strategies, in addition to random ones. But what hypothesis
>> considers "interesting" is based on some heuristics that apply to a
>> broad category of programming. For instance, the "interesting"
>> integers from st.integers() are things like -1, 0, 1, etc. These are
>> important to test, but for factorint, we also want to make sure we
>> test "interesting" integers in terms of their prime factorizations.
>> This might mean numbers that have both small and large prime factors,
>> numbers that have many prime factors, and numbers that have very few
>> prime factors, numbers with factors that are interesting corner cases
>> in terms of the specific algorithms that are implemented, etc.. Some
>> of these are not distributed very well on the number line, so we might
>> have to create a custom strategy that generates them with higher
>> likelihood. Otherwise, they would basically never be chosen at random.
>>
>> Hypothesis also limits the size of the maximum integer generated by
>> integers() (probably to something like 2**64). But factorint can
>> handle numbers much larger than that. Creating custom input strategies
>> is going to be a big part of this project, so it's something you
>> should be thinking about, and learn how to do (it also can be one of
>> the more challenging parts of using hypothesis effectively). As a
>> start, I would learn how to run hypothesis in verbose mode, so that
>> you can see the actual inputs it is generating, then to take a look at
>> those inputs and try to see if they actually cover all the important
>> cases for the given function.
>>
>> The code for factorint is very complex, and testing it rigorously
>> requires testing a lot of different kinds of corner cases. Hypothesis
>> is very good at this sort of thing, but it wasn't built with these
>> specific types of corner cases in mind, so it will need some help to
>> get there.
>>
>> Aaron Meurer
>>
>> On Sat, Mar 8, 2025 at 1:02 PM Pradyot Ranjan <[email protected]> 
>> wrote:
>> >
>> > That makes a lot more sense. Thanks!
>> > This would be a better test then, I guess:
>> > I tried hypothesis testing of factorint().  This is what my test method 
>> looks like:
>> >
>> > @given(n=st.integers())
>> > def test_factorint(n):
>> > factors = factorint(n)
>> > product = 1
>> > for prime, exp in factors.items():
>> > product *= prime ** exp
>> > assert product == n
>> >
>> >
>> > Test runs for all positive and negative integers. I can extend this to 
>> test for kwargs as well. This will eliminate a lot of assert statements 
>> here. This test also doesn't take any significant amount of time.
>> >
>> > On Sat, 8 Mar 2025 at 23:04, Aaron Meurer <[email protected]> wrote:
>> >>
>> >> On Sat, Mar 8, 2025 at 2:33 AM Pradyot Ranjan <[email protected]> 
>> wrote:
>> >> >
>> >> > I tried using hypothesis to test for prime. The function returns nth 
>> prime number, and I tried generating nth prime myself and checked both 
>> (here is given by hypothesis). The test passes but the only problem is it 
>> takes painfully long to test. I tried limiting n value to 100,000 and it 
>> still takes around 40s. We can test composite and other related functions 
>> similarly. We can mark these tests as "slow" and run them separately if 
>> this is the approach we are looking for.
>> >>
>> >> This isn't really the right way to use hypothesis in this context. I'm
>> >> assuming this is slow because your prime generating test function is
>> >> slow. But what's to say that function is even correct? At best you
>> >> could have an obviously correct function that is very slow. Or you'll
>> >> just be reimplementing the function that's already in sympy, which is
>> >> pointless for a test.
>> >>
>> >> For hypothesis, you should think about properties that a function
>> >> should have and test those. For prime generation, you can check that
>> >> the output is prime using isprime(). Testing that the nth prime is
>> >> actually the nth prime is difficult without actually generating all n
>> >> primes. prime() basically already does this itself internally, so
>> >> that's not really a point to doing this in a test. You could test some
>> >> mathematical bounds. Personally, though, I would focus on some other
>> >> functions which have more easy to test properties. Not every function
>> >> in SymPy is easy to property test, because not every function has
>> >> straightforward properties that can be tested. Instead of trying to
>> >> come up with properties for various functions, it would be better to
>> >> try to find functions that have a fairly obvious set of properties
>> >> that can be tested.
>> >>
>> >> Aaron Meurer
>> >>
>> >> >
>> >> > On Wed, 5 Mar 2025 at 00:16, Aaron Meurer <[email protected]> wrote:
>> >> >>
>> >> >> Pretty much any function in SymPy that can have mathematical
>> >> >> properties written about it could potentially benefit from property
>> >> >> testing. However, a big challenge with this project is the input 
>> data
>> >> >> generation (the strategies in hypothesis terminology). Generating
>> >> >> arbitrary SymPy expressions is a difficult problem. There was some
>> >> >> initial work on this at https://github.com/sympy/sympy/pull/17190. 
>> But
>> >> >> the problem is that just generating expressions itself can be buggy.
>> >> >> Consider the expression I posted about in another mailing list 
>> thread.
>> >> >> It takes 8 seconds just to construct, essentially because the
>> >> >> expression constructor itself is buggy.
>> >> >> https://groups.google.com/g/sympy/c/XSJuvibPOro/m/Q3TTETm7AwAJ
>> >> >>
>> >> >> So for now, it's better to actually focus on those functions that 
>> take
>> >> >> relatively simple inputs. The simplest possible input is an integer.
>> >> >> For instance, several functions in the ntheory module basically just
>> >> >> take an integer as input. The next simplest is polynomials. The
>> >> >> initial work that has been done on hypothesis testing has been in
>> >> >> these modules, but the work hasn't gone very far and there is still
>> >> >> more that can be done there. So I would suggest starting where there
>> >> >> are existing hypothesis tests and expanding the tests in those parts
>> >> >> of SymPy. We'll want to expand beyond that, but building strategies 
>> is
>> >> >> one going of the harder parts of this project.
>> >> >>
>> >> >> By the way, if you didn't notice on the idea page, this issue has a
>> >> >> lot more details on hypothesis testing in SymPy
>> >> >> https://github.com/sympy/sympy/issues/20914.
>> >> >>
>> >> >> Aaron Meurer
>> >> >>
>> >> >> On Tue, Mar 4, 2025 at 2:19 AM Pradyot Ranjan <[email protected]> 
>> wrote:
>> >> >> >
>> >> >> > What are the components that can benefit most out of hypothesis 
>> testing? I can try to implement them before I start writing a proposal it 
>> that's okay.
>> >> >> >
>> >> >> > On Tue, 4 Mar, 2025, 4:29 am Pradyot Ranjan, <
>> [email protected]> wrote:
>> >> >> >>
>> >> >> >> Last year I worked as a GSoC student for PyBaMM. We had a 
>> stretch goal regarding the implementation of hypothesis testing which can 
>> be tracked here :
>> >> >> >> - https://github.com/pybamm-team/PyBaMM/issues/4703
>> >> >> >> I also reviewed some PRs regarding this :
>> >> >> >> - https://github.com/pybamm-team/PyBaMM/pull/4724
>> >> >> >>
>> >> >> >> Other than this I also worked as an LFX mentee last year where I 
>> implemented Fuzz testing (which is similar to Hypothesis's property-based 
>> testing in some ways).
>> >> >> >>
>> >> >> >>
>> >> >> >> On Tue, 4 Mar, 2025, 3:23 am Aaron Meurer, <[email protected]> 
>> wrote:
>> >> >> >>>
>> >> >> >>> Yes, that project is still very relevant. If you search the 
>> codebase
>> >> >> >>> for hypothesis you'll see that it is currently only used in a 
>> few
>> >> >> >>> tests, but we want that to increase by a lot.
>> >> >> >>>
>> >> >> >>> What sort of experience do you have with hypothesis?
>> >> >> >>>
>> >> >> >>> Aaron Meurer
>> >> >> >>>
>> >> >> >>> On Mon, Mar 3, 2025 at 1:53 PM Pradyot Ranjan <
>> [email protected]> wrote:
>> >> >> >>> >
>> >> >> >>> > Hi,
>> >> >> >>> > Just wanted to know if this project is still relevant 
>> regarding GSoC? If it is, who is the mentor?
>> >> >> >>> > I have some experience with hypothesis testing and would love 
>> to work here.
>> >> >> >>> >
>> >> >> >>> > Thanks,
>> >> >> >>> > Pradyot Ranjan
>> >> >> >>> >
>> >> >> >>> > --
>> >> >> >>> > You received this message because you are subscribed to the 
>> Google Groups "sympy" group.
>> >> >> >>> > To unsubscribe from this group and stop receiving emails from 
>> it, send an email to [email protected].
>> >> >> >>> > To view this discussion visit 
>> https://groups.google.com/d/msgid/sympy/afa7d863-666f-475f-ae4c-1ccb8a5d3752n%40googlegroups.com
>> .
>> >> >> >>>
>> >> >> >>> --
>> >> >> >>> You received this message because you are subscribed to the 
>> Google Groups "sympy" group.
>> >> >> >>> To unsubscribe from this group and stop receiving emails from 
>> it, send an email to [email protected].
>> >> >> >>> To view this discussion visit 
>> https://groups.google.com/d/msgid/sympy/CAKgW%3D6Jj85nrRpoBWnz7uwziokTKHquJDP%3Dbt8YFuZQi5pTwew%40mail.gmail.com
>> .
>> >> >> >
>> >> >> > --
>> >> >> > You received this message because you are subscribed to the 
>> Google Groups "sympy" group.
>> >> >> > To unsubscribe from this group and stop receiving emails from it, 
>> send an email to [email protected].
>> >> >> > To view this discussion visit 
>> https://groups.google.com/d/msgid/sympy/CANENgK7CmETia1vkWPr2pTrN3mNi1r%2B%2B-ALPAcPQcmaw9uvA3w%40mail.gmail.com
>> .
>> >> >>
>> >> >> --
>> >> >> You received this message because you are subscribed to the Google 
>> Groups "sympy" group.
>> >> >> To unsubscribe from this group and stop receiving emails from it, 
>> send an email to [email protected].
>> >> >> To view this discussion visit 
>> https://groups.google.com/d/msgid/sympy/CAKgW%3D6Lwsx_W7P4rn9om2ZemuKdF4Ad-fEnmfayzvnWSD4k6QQ%40mail.gmail.com
>> .
>> >> >
>> >> > --
>> >> > You received this message because you are subscribed to the Google 
>> Groups "sympy" group.
>> >> > To unsubscribe from this group and stop receiving emails from it, 
>> send an email to [email protected].
>> >> > To view this discussion visit 
>> https://groups.google.com/d/msgid/sympy/CANENgK6x4TvEZJf%3DC_yosMqQwZwBCx-pxvhbG8%3DAzFB_k6JVKA%40mail.gmail.com
>> .
>> >>
>> >> --
>> >> You received this message because you are subscribed to the Google 
>> Groups "sympy" group.
>> >> To unsubscribe from this group and stop receiving emails from it, send 
>> an email to [email protected].
>> >> To view this discussion visit 
>> https://groups.google.com/d/msgid/sympy/CAKgW%3D6%2BiYktWMN4aw0vjwpLgPmHu%3DoMRUZAc-xjb7uFdkp19oQ%40mail.gmail.com
>> .
>> >
>> > --
>> > You received this message because you are subscribed to the Google 
>> Groups "sympy" group.
>> > To unsubscribe from this group and stop receiving emails from it, send 
>> an email to [email protected].
>> > To view this discussion visit 
>> https://groups.google.com/d/msgid/sympy/CANENgK4_%3D5Dws%3D3H-Pq3pL4dxBe5Do1SvKWj8eFjX7fqJUVxkA%40mail.gmail.com
>> .
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "sympy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>>
> To view this discussion visit 
>> https://groups.google.com/d/msgid/sympy/CAKgW%3D6KGR3G_ViOrTgeK7YGjzEwHFH6chzirhBG55z%2BnZ8GENw%40mail.gmail.com
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/sympy/9f542f42-edd6-4da1-ac92-8d47a4a476bcn%40googlegroups.com.

Reply via email to