Will try to get the example data and code out - there's a lot of internal
logic at the moment

The commit we're using is e402f61aceb64f659845bdc5f03cf4f29797277b

- Andy


On Mon, Jun 29, 2020 at 9:29 AM Jon Malkin <jon.mal...@gmail.com> wrote:

> You mean you were calling the java library from python? Our testing
> generally has generally shown C++ to be faster.
>
> This is still too vague for me to be able to say much. There's no specific
> git version (tag or hash), no code, and no data.
>
>   jon
>
> On Mon, Jun 29, 2020 at 9:08 AM Andy Dang <nam...@gmail.com> wrote:
>
>> I was using the Git version and was running with various sketches. I
>> thought the slowness is from Python, but I was able to scan through the
>> same data calculating the same statistics with the Java library in roughly
>> 3 minutes.
>>
>> Any idea why there's such a big difference between the two languages?
>>
>> - Andy
>>
>> On Fri, Jun 26, 2020, 21:02 Jon Malkin <jmal...@apache.org> wrote:
>>
>>> I haven't done long running python tests recently but I haven't seen
>>> that.
>>>
>>> After you using a release version of the library or did you check out
>>> from git? And which sketch or sketches are you using?
>>>
>>> I've compiled the library in debug mode (gotta modify setup.py to force
>>> that) and run python via gdb but that's not gonna work nicely on 1.6gb of
>>> data. It's sloooooooowwwwwww.
>>>
>>>   jon
>>>
>>>
>>> On Fri, Jun 26, 2020, 4:39 PM Andy Dang <nam...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I've been trying to integrate Datasketches into our ecosystem - really
>>>> great work!
>>>>
>>>> However, when I tried to run various sketches with the lending club
>>>> data from Kaggle (1.6GB in size) on the raw CSV data in Python on my MacOS.
>>>> I noticed after a while that the process will crash with a mysterious
>>>> segfault on my Mac OS (Catalina)
>>>> My CLang version:
>>>>
>>>> *➜  **Workspace* c++ --version
>>>>
>>>> Apple clang version 11.0.0 (clang-1100.0.33.17)
>>>>
>>>> Target: x86_64-apple-darwin19.5.0
>>>>
>>>> Thread model: posix
>>>>
>>>> InstalledDir: /Library/Developer/CommandLineTools/usr/bin
>>>>
>>>> *➜  **Workspace* gcc --version
>>>>
>>>> Configured with: --prefix=/Library/Developer/CommandLineTools/usr
>>>> --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
>>>>
>>>> Apple clang version 11.0.0 (clang-1100.0.33.17)
>>>>
>>>> Target: x86_64-apple-darwin19.5.0
>>>>
>>>> Thread model: posix
>>>>
>>>> InstalledDir: /Library/Developer/CommandLineTools/usr/bin
>>>>
>>>> Replacing this with Miniconda cxx toolchain solves the problem.
>>>>
>>>> I'll get a script along with the data for reproducibility, but before
>>>> that I wonder if anyone has come across this issue before?
>>>>
>>>> Cheers!
>>>> - Andy
>>>>
>>>

Reply via email to