Re: [Imdbpy-help] Generating the HTML parsers

2018-02-28 Thread H. Turgut Uyar

On 02/28/2018 12:13 AM, Davide Alberani wrote:
> On Tue, Feb 27, 2018 at 9:05 AM, H. Turgut Uyar  wrote:
>> So I decided to develop a parser generator that will read a
>> specification for a parser and generate the necessary code
> 
> What kind of help you need, mostly?
> 
> 

Most importantly, I can't really decide if this is worth pursuing. I
feel like the approach has some potential but I can't be sure. The code
can be manually written after all. The spec format is more generic, so
it might be easier to refactor the parsers in the future but I don't
know how likely that is to happen.

And also, would we gain from transitioning to piculet based parsers?
Possible advantages could be:

- Making py2 support easier if we want that.
- Dropping the hard dependency on lxml, again, if we want that.
- Easier maintenance of the parsers (?).
- More involvement from developers for writing parsers (?).

So my main problem is that I'm undecided whether I should devote more
time to this or not. Any insight into that issue would be much appreciated.

--
Turgut

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] Generating the HTML parsers

2018-02-27 Thread Davide Alberani
On Tue, Feb 27, 2018 at 9:05 AM, H. Turgut Uyar  wrote:
>
> So I decided to develop a parser generator that will read a
> specification for a parser and generate the necessary code

That's really cool, I plan to give a look at it as soon as possible.

What kind of help you need, mostly?


-- 
Davide Alberani   [PGP KeyID: 0x3845A3D4AC9B61AD]
http://www.mimante.net/

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


[Imdbpy-help] Generating the HTML parsers

2018-02-27 Thread H. Turgut Uyar
Hi,

Over the last few years, I've refactored the basis for the IMDbPY HTML
parsers into a separate package called "piculet" that could be used with
-hopefully- any HTML markup. It has no required external dependency,
supports py2/py3/pypy and improves on the current IMDbPY parsers with
some features and a more consistent interface.

The idea was, and still is, that at some point we can reimplement the
IMDbPY parsers using piculet. This shouldn't be too hard since the
syntax is quite similar. I've attempted this a few times already and
managed to make some headway but trying to fit things into the current
codebase kept distracting me from the actual job of dealing with the
parsers.

So I decided to develop a parser generator that will read a
specification for a parser and generate the necessary code. I hope this
will make the transition easier. My not-so-preliminary work is here:

https://github.com/uyar/piculet_imdb

Note that this project is not a full package like IMDbPY. It doesn't
have the Movie/Person/etc classes. It doesn't even have the code to
fetch the IMDb pages (except for the simple retrievers in the tests). If
we decide that this approach makes sense, we could create a template
suitable for IMDbPY.

If anyone's interested I'd be happy to hear thoughts, suggestions, and
of course pull requests.

Have a nice day,

--
Turgut Uyar

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help