Re: Compare word to word files

2021-03-10 Thread jak

Il 11/03/2021 05:28, CLYMATIC GAMING ha scritto:

Hello ,
I want to compare word to word files
please he me!


copy and paset this string onto
Google page:

how to find difference between 2 files in Python

...and press "Google Search" button.
--
https://mail.python.org/mailman/listinfo/python-list


Digest emails stopped?

2021-03-10 Thread pjfarley3
Is anyone else who is set to receive digest emails receiving them since
Sunday 3/7/2021?  Sunday was the last day I received one.

I was under the impression that any activity at all on the list would result
in a digest email being sent at the end of that day, and according to the
archive there has been some activity since 3/7/2021.

Peter

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Application problems

2021-03-10 Thread Thomas Jollans

On 10/03/2021 21:50, Mats Wichmann wrote:


For the first one, don't feel too bad, this ("opening the normal 
python") seems to be biting a lot of people recently



I wonder why. Python's installation process isn't any different from 
most other Windows software released the past 25-ish years. Is it 
possible that Windows 10's search feature sometimes makes poor choices, 
and typing "python" just brings up the wrong thing?


(I just tested it on a clean VM and that's not what happens, but maybe 
for some people? I dunno)


--
https://mail.python.org/mailman/listinfo/python-list


Re: How to create both a c extension and a pure python package

2021-03-10 Thread Thomas Jollans

On 10/03/2021 20:57, Mats Wichmann wrote:

On 3/10/21 11:56 AM, Thomas Jollans wrote:

On 10/03/2021 18:42, Marco Sulla wrote:

On Wed, 10 Mar 2021 at 16:45, Thomas Jollans  wrote:

Why are you doing this?

If all you want is for it to be possible to install the package from
source on a system that can't use the C part, you could just declare
your extension modules optional

Because I want to provide (at least) two wheels: a wheel for linux
users with the C extension compiled and a generic wheel in pure python
as a fallback for any other architecture.


What's wrong with sdist as a fallback rather than a wheel?

That has the added benefit of people on other architectures have the 
opportunity to use the extension module if they have a compiler and 
the necessary libraries and headers installed...


Doesn't that mean nasty failures for those don't have the correct 
build setup (like almost every Windows user on the planet)?   This 
isn't a snide question, I'm actually interested in solving roughly the 
same problem as the OP.


I believe that this is pretty much exactly the problem that the 
"optional" flag for extensions solves.


docs: "specifies that a build failure in the extension should not abort 
the build process, but simply skip the extension."


I would assume this refers to *any* build failure (including a missing 
compiler), so people without a proper build system might get error 
messages, but should still get a working package (assuming of course the 
C extensions is *actually* optional, like an accelerator module).


But I don't have experience with this myself so take what I say with a 
grain of salt.



- Thomas



--
https://mail.python.org/mailman/listinfo/python-list


Re: How to create both a c extension and a pure python package

2021-03-10 Thread Michał Jaworski
I was dealing with a very similar problem a long time ago. I wanted to
provide built wheels of my Cython extension for various OSes (Windows,
macOS, Linux) and various Python versions (from 2.7 up to 3.9) but I also
wanted to have a sdist package for all the variations that I didn't cover
at the time.

What I ended up doing was automating the whole process on two different CI
systems (one of those didn't have Windows support and the other was
Windows-only) but also provided the sdist distribution with both Cython and
Cython-generated C++ sources. Users can opt-in for cython installation
using setuptools extras feature. For instance:

pip install my-package[with-cython]

Additionally whether to compile from Cython or C++ sources is controlled
with environment variable, something like:

CYTHONIZE=1 pip install my-package

Using env vars instead of extra command arg for setup.py makes it easy to
use with pip, poetry or whatever installer your user decides to use.

The whole setup required a bit of work but in the end it has been working
for a few years already. Definitely the hardest thing to maintain are those
ever-changing CI systems. Adding the support for optional building from
sources was actually the simplest to do. I don't spend a lot of time on the
project lately but if you are interested how the whole thing was here's the
url: https://github.com/swistakm/pyimgui


śr., 10 mar 2021 o 20:57 Mats Wichmann  napisał(a):

> On 3/10/21 11:56 AM, Thomas Jollans wrote:
> > On 10/03/2021 18:42, Marco Sulla wrote:
> >> On Wed, 10 Mar 2021 at 16:45, Thomas Jollans  wrote:
> >>> Why are you doing this?
> >>>
> >>> If all you want is for it to be possible to install the package from
> >>> source on a system that can't use the C part, you could just declare
> >>> your extension modules optional
> >> Because I want to provide (at least) two wheels: a wheel for linux
> >> users with the C extension compiled and a generic wheel in pure python
> >> as a fallback for any other architecture.
> >
> > What's wrong with sdist as a fallback rather than a wheel?
> >
> > That has the added benefit of people on other architectures have the
> > opportunity to use the extension module if they have a compiler and the
> > necessary libraries and headers installed...
>
> Doesn't that mean nasty failures for those don't have the correct build
> setup (like almost every Windows user on the planet)?   This isn't a
> snide question, I'm actually interested in solving roughly the same
> problem as the OP.
>
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Application problems

2021-03-10 Thread Igor Korot
Hi,

On Wed, Mar 10, 2021 at 2:37 PM Yoosuf Oluwatosin via Python-list
 wrote:
>
>
> I have downloaded python 3.9.2 on my hp laptop with windows 10 and tried 
> opening both the normal python and the idle python on my pc but the norml 
> keeps opening the modify, repair and uninstall page while the idle keeps 
> giving a startup error. I have uninstalled, deleted and reinstalled several 
> times but it is still the same thing. What could be the problem.

What kind of error you get from IDLE?
Can you open the Command Prompt and type "python" (without quotes) and
press Enter?

Thank you.

>
> Sent from Yahoo Mail for iPhone
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Application problems

2021-03-10 Thread Mats Wichmann



On 3/10/21 12:25 PM, Yoosuf Oluwatosin via Python-list wrote:


I have downloaded python 3.9.2 on my hp laptop with windows 10 and tried 
opening both the normal python and the idle python on my pc but the norml keeps 
opening the modify, repair and uninstall page while the idle keeps giving a 
startup error. I have uninstalled, deleted and reinstalled several times but it 
is still the same thing. What could be the problem.


For the first one, don't feel too bad, this ("opening the normal 
python") seems to be biting a lot of people recently - the downloaded 
file is the installer, not Python itself, and you can remove it after 
installing to avoid accidentally launching it (there's some discussion 
of renaming it to make it a little more clear it's not Python itself). 
You can always still get at the modify/repair functionality the normal 
windows way, from the apps menu.


Try launching Python from the start menu. You can either navigate to the 
start menu item, or start typing Python and it should give you a match 
that looks like


Python 3.8 (64-bit)   App

Or from a shell (cmd or Powershell) window, type

py


More information here:

https://docs.python.org/3/using/windows.html

there are actually a fair number of permutations to Python on Windows, 
the page does a good job of covering them.

--
https://mail.python.org/mailman/listinfo/python-list


Application problems

2021-03-10 Thread Yoosuf Oluwatosin via Python-list


I have downloaded python 3.9.2 on my hp laptop with windows 10 and tried 
opening both the normal python and the idle python on my pc but the norml keeps 
opening the modify, repair and uninstall page while the idle keeps giving a 
startup error. I have uninstalled, deleted and reinstalled several times but it 
is still the same thing. What could be the problem.

Sent from Yahoo Mail for iPhone
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to create both a c extension and a pure python package

2021-03-10 Thread Mats Wichmann

On 3/10/21 11:56 AM, Thomas Jollans wrote:

On 10/03/2021 18:42, Marco Sulla wrote:

On Wed, 10 Mar 2021 at 16:45, Thomas Jollans  wrote:

Why are you doing this?

If all you want is for it to be possible to install the package from
source on a system that can't use the C part, you could just declare
your extension modules optional

Because I want to provide (at least) two wheels: a wheel for linux
users with the C extension compiled and a generic wheel in pure python
as a fallback for any other architecture.


What's wrong with sdist as a fallback rather than a wheel?

That has the added benefit of people on other architectures have the 
opportunity to use the extension module if they have a compiler and the 
necessary libraries and headers installed...


Doesn't that mean nasty failures for those don't have the correct build 
setup (like almost every Windows user on the planet)?   This isn't a 
snide question, I'm actually interested in solving roughly the same 
problem as the OP.




--
https://mail.python.org/mailman/listinfo/python-list


Re: How to create both a c extension and a pure python package

2021-03-10 Thread Thomas Jollans

On 10/03/2021 18:42, Marco Sulla wrote:

On Wed, 10 Mar 2021 at 16:45, Thomas Jollans  wrote:

Why are you doing this?

If all you want is for it to be possible to install the package from
source on a system that can't use the C part, you could just declare
your extension modules optional

Because I want to provide (at least) two wheels: a wheel for linux
users with the C extension compiled and a generic wheel in pure python
as a fallback for any other architecture.


What's wrong with sdist as a fallback rather than a wheel?

That has the added benefit of people on other architectures have the 
opportunity to use the extension module if they have a compiler and the 
necessary libraries and headers installed...





If I make the extension optional, as far as I know, only one wheel is
produced: the wheel with the extension if all is successful, or the
pure py wheel.



--
https://mail.python.org/mailman/listinfo/python-list


Re: How to create both a c extension and a pure python package

2021-03-10 Thread Marco Sulla
On Wed, 10 Mar 2021 at 16:45, Thomas Jollans  wrote:
> Why are you doing this?
>
> If all you want is for it to be possible to install the package from
> source on a system that can't use the C part, you could just declare
> your extension modules optional

Because I want to provide (at least) two wheels: a wheel for linux
users with the C extension compiled and a generic wheel in pure python
as a fallback for any other architecture.

If I make the extension optional, as far as I know, only one wheel is
produced: the wheel with the extension if all is successful, or the
pure py wheel.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to create both a c extension and a pure python package

2021-03-10 Thread Thomas Jollans

On 09/03/2021 23:42, Marco Sulla wrote:

As title. Currently I ended up using this trick in my setup.py:


if len(argv) > 1 and argv[1] == "c":
 sys.argv = [sys.argv[0]] + sys.argv[2:]
 setuptools.setup(ext_modules = ext_modules, **common_setup_args)
else:
 setuptools.setup(**common_setup_args)


So if I pass "c" as the first argument of ./setup.py , the c extension
is builded, otherwise the py version is packaged.

Is there not a better way to do this?



Why are you doing this?

If all you want is for it to be possible to install the package from 
source on a system that can't use the C part, you could just declare 
your extension modules optional, with the "optional" argument to 
setuptools.Extension. see 
(https://docs.python.org/3/distutils/apiref.html#distutils.core.Extension)



- Thomas


--
https://mail.python.org/mailman/listinfo/python-list


Re: How to loop over a text file (to remove tags and normalize) using Python

2021-03-10 Thread Dan Stromberg
If you want text without tags, sometimes it's easier to use a text-based
web browser, EG:

#!/bin/sh

# for mutt to view html e-mails

#where html2txt is a shell script that performs the conversion, e.g. by
#calling

links -html-numbered-links 1 -html-images 1 -dump "file://$@"

#or
#
#lynx -force_html -dump "$@"
#
#or
#
#w3m -T text/html -F -dump "$@"


On Tue, Mar 9, 2021 at 1:26 PM S Monzur  wrote:

> Dear List,
>
> Newbie here. I am trying to loop over a text file to remove html tags,
> punctuation marks, stopwords. I have already used Beautiful Soup (Python v
> 3.8.3) to scrape the text (newspaper articles) from the site. It returns a
> list that I saved as a file. However, I am not sure how to use a loop in
> order to process all the items in the text file.
>
> In the code below I have used listfilereduced.text(containing data from one
> news article, link to listfilereduced.txt here
> <
> https://drive.google.com/file/d/1ojwN4u8cmh_nUoMJpdZ5ObaGW5URYYj3/view?usp=sharing
> >),
> however I would like to run this code on listfile.text(containing data from
> multiple articles, link to listfile.text
> <
> https://drive.google.com/file/d/1V3s8w8a3NQvex91EdOhdC9rQtCAOElpm/view?usp=sharing
> >
> ).
>
>
> Any help would be greatly appreciated!
>
> P.S. The text is in a Non-English script, but the tags are all in English.
>
>
> #The code below is for a textfile containing just one item. I am not sure
> how to tweak this to make it run for listfile.text (which contains raw data
> from multiple articles) with open('listfilereduced.txt', 'r',
> encoding='utf8') as my_file: rawData = my_file.read() print(rawData)
> #Separating body text from other data articleStart = rawData.find(" class=\"story-element story-element-text\">") articleData =
> rawData[:articleStart] articleBody = rawData[articleStart:]
> print(articleData) print("***") print(articleBody) print("***")
> #First, I define a function to strip tags from the body text def
> stripTags(pageContents): insideTag = 0 text = '' for char in pageContents:
> if char == '<': insideTag = 1 elif (insideTag == 1 and char == '>'):
> insideTag = 0 elif insideTag == 1: continue else: text += char return text
> #Calling the function articleBodyText = stripTags(articleBody)
> print(articleBodyText) ##Isolating article title and publication date
> TitleEndLoc = articleData.find("") dateStartLoc =
> articleData.find(" class=\"storyPageMetaData-m__publish-time__19bdV\">")
> dateEndLoc=articleData.find(" storyPageMetaDataIcons-m__icons__3E4Xg\">") titleString =
> articleData[:TitleEndLoc] dateString = articleData[dateStartLoc:dateEndLoc]
> ##Call stripTags to clean articleTitle= stripTags(titleString) articleDate
> = stripTags(dateString) print(articleTitle) print(articleDate) #Cleaning
> the date a bit more startLocDate = articleDate.find(":") endLocDate =
> articleDate.find(",") articleDateClean =
> articleDate[startLocDate+2:endLocDate] print(articleDateClean) #save all
> this data to a dictionary that saves the title, data and the body text
> PAloTextDict = {"Title": articleTitle, "Date": articleDateClean, "Text":
> articleBodyText} print(PAloTextDict) #Normalize text by: #1. Splitting
> paragraphs of text into lists of words articleBodyWordList =
> articleBodyText.split() print(articleBodyWordList) #2.Removing punctuation
> and stopwords from bnlp.corpus import stopwords, punctuations #A. Remove
> punctuation first listNoPunct = [] for word in articleBodyWordList: for
> mark in punctuations: word=word.replace(mark, '') listNoPunct.append(word)
> print(listNoPunct) #B. removing stopwords banglastopwords = stopwords()
> print(banglastopwords) cleanList=[] for word in listNoPunct: if word in
> banglastopwords: continue else: cleanList.append(word) print(cleanList)
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to loop over a text file (to remove tags and normalize) using Python

2021-03-10 Thread Peter Otten

On 10/03/2021 13:19, S Monzur wrote:

I initially scraped the links using beautiful soup, and from those links
downloaded the specific content of the articles I was interested in
(titles, dates, names of contributor, main texts) and stored that
information in a list. I then saved the list to a text file.
https://pastebin.com/8BMi9qjW . I am now trying to remove the html tags
from this text file, and running into issues as mentioned in the previous
post.


As I said in my previous post, when you process the list entries
separately you will probably avoid the problem.

Unfortunately with the format you chose to store your intermediate data
you cannot reconstruct it reliably.

I recommend that you either

(1) avoid the text file and extract the interesting parts from PASoup
directly or

(2) pick a different file format to store the result sets. For
short-term storage pickle
 should work.

--
https://mail.python.org/mailman/listinfo/python-list


Re: How to loop over a text file (to remove tags and normalize) using Python

2021-03-10 Thread S Monzur
I initially scraped the links using beautiful soup, and from those links
downloaded the specific content of the articles I was interested in
(titles, dates, names of contributor, main texts) and stored that
information in a list. I then saved the list to a text file.
https://pastebin.com/8BMi9qjW . I am now trying to remove the html tags
from this text file, and running into issues as mentioned in the previous
post.



On Wed, Mar 10, 2021 at 3:46 PM Peter Otten <__pete...@web.de> wrote:

> On 10/03/2021 04:35, S Monzur wrote:
> > Thanks! I ended up using beautiful soup to remove the html tags and
> create
> > three lists (titles of article, publications dates, main body) but am
> still
> > facing a problem where the list is not properly storing the main body.
> > There is something wrong with my code for that section, and any comment
> > would be really helpful!
> >
> >   ListFile Text
> > <
> https://drive.google.com/file/d/1V3s8w8a3NQvex91EdOhdC9rQtCAOElpm/view?usp=sharing
> >
>
> How did you create that file?
>
>  > BeautifulSoup code for removing tags 
>
> > print(bodytext[0]) # so here, I'm only getting the first paragraph of
> the body of the first article, not all of the first article
> >
> > print(bodytext[1]) # here, I'm getting the second paragraph of the first
> article, and not the second article
>
> It may help if you process the individual articles with beautiful soup,
> not the whole list at once.
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to loop over a text file (to remove tags and normalize) using Python

2021-03-10 Thread Peter Otten

On 10/03/2021 04:35, S Monzur wrote:

Thanks! I ended up using beautiful soup to remove the html tags and create
three lists (titles of article, publications dates, main body) but am still
facing a problem where the list is not properly storing the main body.
There is something wrong with my code for that section, and any comment
would be really helpful!

  ListFile Text



How did you create that file?

> BeautifulSoup code for removing tags 


print(bodytext[0]) # so here, I'm only getting the first paragraph of the body 
of the first article, not all of the first article

print(bodytext[1]) # here, I'm getting the second paragraph of the first 
article, and not the second article


It may help if you process the individual articles with beautiful soup, 
not the whole list at once.


--
https://mail.python.org/mailman/listinfo/python-list


Re: Best practices regarding PYTHONPATH

2021-03-10 Thread Thomas Jollans

On 09/03/2021 22:52, Cameron Simpson wrote:

On 09Mar2021 05:00, Larry Martell  wrote:

Which is considered better? Having a long import path or setting PYTHONPATH?

For example, in a project where 50% of the imports come from the same top
level directory is it better to add that dir to the path or reference it in
the import statements?

All the below is personal opinion.

I'd be leaving the $PYTHONPATH alone - I tweak it to access the required
libraries, but not to change their dotted module paths.

For example, I include ~/lib/python in my personal environment to access
my personal modules, but I don't include
~/lib/python/cs/app/someapp/subpackage in order to shorten
"cs.app.someapp.subpackage.foo" to just "foo".

This is largely to avoid accidental shadowing of other modules. For
example, supposing "foo" above were the subpath "os.path". Yes,
contrived, but that's the flavour of problem I'm avoiding.

I think I'd be ok with it provided I didn't go too far down. Eg, if all
my "someapp"s were distinctively named I could be persuaded to use
~/lib/python/cs/app in the $PYTHONPATH, allowing
"someapp.subpackage.foo". But I'd still be reluctant.


If you have a bunch of different packages you want to think of as "top 
level" but that live in different places for organizational reasons 
(e.g. someapp in ~/lib/python/cs/app/someapp and thatlib in 
~/lib/python/cs/lib/util/thatlib), I'd advocate treating them as proper 
Python packages with their own setup.py and everything, and installing 
them properly.


Editable installs (setup.py develop or pip install -e .) are great for this.

Personally when this is impractical for some reason I prefer creating 
*.pth files in site-packages to messing with PYTHONPATH as I find this 
easier to maintain, especially if you ever end up using different Python 
versions or different virtual environments.


Just my 0.02 €.




If the project modules are tightly bound relative imports can get you a
fair way. Certainly within a package I do a lot of:

 from .fixtures import these_things

to grab from the adjacent "fixtures.py" file instead of:

 from project.submodule.fixtures import these_things

And I'm usually happy to go up an additional level:

 from ..package2.fixtures import those_things

Somewhere around 3 dots I start to worry about presuming too much, but
that is an arbitrary decision based on the discipline (or lack of it) in
the project naming.

Cheers,
Cameron Simpson 



--
https://mail.python.org/mailman/listinfo/python-list


Re: How to loop over a text file (to remove tags and normalize) using Python

2021-03-10 Thread Joel Goldstick
On Tue, Mar 9, 2021 at 10:36 PM S Monzur  wrote:
>
> Thanks! I ended up using beautiful soup to remove the html tags and create
> three lists (titles of article, publications dates, main body) but am still
> facing a problem where the list is not properly storing the main body.
> There is something wrong with my code for that section, and any comment
> would be really helpful!
>

Can you use a very small file to test?  I think you could edit your
data file to contain maybe two or three articles.  Then you could post
that file in your email (no attachments).  And you could post your
code which is probably not very long.  In that way, people here will
be better able to help you.
-- 
https://mail.python.org/mailman/listinfo/python-list