Re: [Imdbpy-help] Search for movie by title, year and runtime?

2018-09-28 Thread H. Turgut Uyar
Hi,

Could it help to use fuzzy string matching on the local tsv files?

https://github.com/seatgeek/fuzzywuzzy


--
Turgut


On 27-09-2018 18:33, Kesselheim, David DK - NOH wrote:
> Hi,
> 
>  
> 
> I need to look up quite a large number of titles (16k). I have the title
> (more or less, sometimes ‘4k’, ‘3D’, etc is appended to it), the year
> and the runtime. With that info I can get the correct title-ID from IMDB
> but it is very slow because after having searched for the title and
> filtered by year I need to run ia.update(title) on each of my initial
> matches to compare the runtime.
> 
>  
> 
> Is there a way to speed up the process?
> 
>  
> 
> I insourced the tsv.gz files from IMDB but since the title isn’t an
> exact match I am not sure how well SQL Server is suited for finding the
> right title.
> 
>  
> 
> Thanks for the help!
> 
>  
> 
> *David *
> 
>  
> 
> 
> 
> 
> 
> ___
> Imdbpy-help mailing list
> Imdbpy-help@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/imdbpy-help
> 


___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


[Imdbpy-help] Documentation updates

2018-04-04 Thread H. Turgut Uyar
Hi,

A few days ago the documentation was converted to a Sphinx project. This
was mostly a straightforward format conversion that didn't touch the
content or the structure of the documents.

I've been working on a draft for organizing the structure of the
documents. The content is still the same, I've just tried to organize it
into sections.

This draft is in the reorganize-docs branch. It's still not finished but
it's enough to give an idea. Help and feedback would be welcome as always.

Best,

--
Turgut


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


[Imdbpy-help] Python 2 support

2018-04-02 Thread H. Turgut Uyar
Hi,

I've created a new branch "py2" which supports both py2 and py3 in the
same codebase. As of today, it's up-to-date with the current master and
passes almost all the tests. But it only deals with the http access
system; there are no changes regarding sql or s3.

If you need Py2 support please test it and send us feedback.

Have a nice week,

--
Turgut

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] Generating the HTML parsers

2018-02-28 Thread H. Turgut Uyar

On 02/28/2018 12:13 AM, Davide Alberani wrote:
> On Tue, Feb 27, 2018 at 9:05 AM, H. Turgut Uyar  wrote:
>> So I decided to develop a parser generator that will read a
>> specification for a parser and generate the necessary code
> 
> What kind of help you need, mostly?
> 
> 

Most importantly, I can't really decide if this is worth pursuing. I
feel like the approach has some potential but I can't be sure. The code
can be manually written after all. The spec format is more generic, so
it might be easier to refactor the parsers in the future but I don't
know how likely that is to happen.

And also, would we gain from transitioning to piculet based parsers?
Possible advantages could be:

- Making py2 support easier if we want that.
- Dropping the hard dependency on lxml, again, if we want that.
- Easier maintenance of the parsers (?).
- More involvement from developers for writing parsers (?).

So my main problem is that I'm undecided whether I should devote more
time to this or not. Any insight into that issue would be much appreciated.

--
Turgut

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


[Imdbpy-help] Generating the HTML parsers

2018-02-27 Thread H. Turgut Uyar
Hi,

Over the last few years, I've refactored the basis for the IMDbPY HTML
parsers into a separate package called "piculet" that could be used with
-hopefully- any HTML markup. It has no required external dependency,
supports py2/py3/pypy and improves on the current IMDbPY parsers with
some features and a more consistent interface.

The idea was, and still is, that at some point we can reimplement the
IMDbPY parsers using piculet. This shouldn't be too hard since the
syntax is quite similar. I've attempted this a few times already and
managed to make some headway but trying to fit things into the current
codebase kept distracting me from the actual job of dealing with the
parsers.

So I decided to develop a parser generator that will read a
specification for a parser and generate the necessary code. I hope this
will make the transition easier. My not-so-preliminary work is here:

https://github.com/uyar/piculet_imdb

Note that this project is not a full package like IMDbPY. It doesn't
have the Movie/Person/etc classes. It doesn't even have the code to
fetch the IMDb pages (except for the simple retrievers in the tests). If
we decide that this approach makes sense, we could create a template
suitable for IMDbPY.

If anyone's interested I'd be happy to hear thoughts, suggestions, and
of course pull requests.

Have a nice day,

--
Turgut Uyar

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] Presentation and windows problem

2017-11-17 Thread H. Turgut Uyar

Hi,

Is it possible that pyinstaller doesn't handle the setup.py file and/or the 
MANIFEST.in file like the sdist command does? Those are the places where 
the cookies.json file is listed.


Turgut




On November 18, 2017 10:13:07 AM "H. Turgut Uyar"  wrote:


Hi,

Is it possible that pyinstaller doesn't handle the setup.py file and/or the 
MANIFEST.in file like the sdist command does? Those are the places where 
the cookies.json file is listed.


Turgut


On November 16, 2017 9:55:34 PM Martín Torre Castro 
 wrote:




I'll try to build it on Windows, in the next days.
Can you tell me how you have set up the environment?
Which version of Python?
Which command you run to build the .exe?



 Python 3.6.3
pyInstaller --onefile sample_get_info.py




> 2017-11-16 00:44:46,732 WARNING [imdbpy] C:\Program Files
> (x86)\Python36-32\lib\site-packages\imdb\__init__.py:165: Unable to read
> configuration file; complete error: 'ConfigParserWithCase' object has no
> attribute '_boolean_states'

Nice; this seems to be a problem related to the parsing of the
imdbpy.cfg file: remove it (I'll try to reproduce and fix the bug later),
since you don't really need it.


> grParser = GatherRefs(useModule=self._useModule)
> AttributeError: 'DOMHTMLPlotParser' object has no attribute '_useModule'

It seems you're using an old version.
IMDbPY just came out of a huge set of changes, and it still have some bugs
here
and there.  The above one should already be fixed in the repository
version:
https://github.com/alberanid/imdbpy

IMDbPY==6.0


I'm checking now. It says everything is 'already-up-to-date'.


Finally, I was trying the file from a very little gui in a '.pyw' file. Now
I tried only the text-mode script and it gives this error at the command
line. It's happening at the IMDb() call.

Traceback (most recent call last):
  File "sample_get_info.py", line 56, in 
  File "sample_get_info.py", line 24, in get_data
  File "site-packages\imdb\__init__.py", line 186, in IMDb
  File "", line 971, in _find_and_load
  File "", line 955, in _find_and_load_unlocked
  File "", line 665, in _load_unlocked
  File "c:\program files
(x86)\python36-32\lib\site-packages\PyInstaller\loader\pyimod03_importers.py",
line 631, in exec_module
exec(bytecode, module.__dict__)
  File "site-packages\imdb\parser\http\__init__.py", line 99, in 
FileNotFoundError: [Errno 2] No such file or directory:
'C:\\UsersAppData\\Local\\Temp\\_MEI56602\\imdb\\parser\\http\\cookie
s.json'



--
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


--
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


[Imdbpy-help] Help with tests

2017-11-07 Thread H. Turgut Uyar
Hi,

Now that the test skeleton is in the repository (the codename-simplify
branch), it would be very helpful if more people could contribute with
tests. The testing code is quite simple, here's an example from the
person main details page tests:


def test_name_should_be_canonical(person_main_details):
page = person_main_details('keanu_reeves')
data = parser.parse(page)['data']
assert data['name'] == 'Reeves, Keanu'


The body of every test function is like that. Get the page, run through
the parser, check the result. I have written the functions for getting
the movie combined details page and the person main details page. I will
add the other types of pages soon or I can add one when anyone needs it.

The current tests are here:

https://github.com/alberanid/imdbpy/tree/codename-simplify/tests

What needs to be done is as follows:

1. Select something that hasn't been tested yet.

2. Select a person (or movie, or company etc) page that can be used to
test it.

3. If not already there, add its IMDb id to the relevant dictionary in
the conftest.py file (MOVIES, PEOPLE).

4. Figure out the name of the key in the result and write an assert to
check for the correct value.

Having more tests would increase our confidence in the parsers,
especially after the port to Python3. Any contributions would be greatly
appreciated.

Cheers

--
Turgut

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] Setup for Ubuntu

2010-10-06 Thread H. Turgut Uyar
On 10/06/2010 10:13 AM, Davide Alberani wrote:
> On Tue, Oct 5, 2010 at 12:53 AM, Rob Larson  wrote:
>>  Copy the sample file for wsgi.conf to the Apache configuration directory ( 
>> i am assuming /etc/apache2?)
> 
> Probably for apache2 the right place is /etc/apache2/conf.d/
> 

Yes, it should be /etc/apache2/conf.d/

>>  and edit the paths for the imdbpy.wsgi script and the static files. ( I am 
>> assuming the path to my home directory?)
> 

The path for imdbpy.wsgi is your home directory (plus "imdbpy.wsgi"),
the path to the static files is:
  .../site-packages/imdbpykit/web/static
This directory contains a hidden file (.environment.xml), for which the
server also needs write permissions.

Bye,

-- 
H. Turgut Uyar  [GPG KeyID: 0xEAF45FB8]
http://web.itu.edu.tr/uyar/


--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] imdbpy 4.6 can't fetch person head shot (and who add IMDb to the names ?)

2010-09-29 Thread H. Turgut Uyar
On 09/29/2010 12:45 PM, Davide Alberani wrote:
> 
> I'll have time to check the patch and commit it to Mercurial only
> tomorrow.  By the way, anyone should feel free to fork the IMDbPY
> repository on Bitbucket (specifically
> http://bitbucket.org/alberanid/imdbpy_new_search_parsers/ ),
> commit his changes and ask for a pull.
> 


Is it that fork or the one called imdbpy_parsers2010? I might have
forked the wrong one.

>> tried building it myself, imdb page killed lxml, elementtree and
>> BeautifulSoup,
> 
> Strange: we're based on lxml (falling back to BeautifulSoup if
> lxml is not installed).
> 

At some places we have to fix the HTML before feeding it to lxml or
beautifulsoup (the "preprocessors" in the code). Maybe it was one of
those pages.

-- 
H. Turgut Uyar  [GPG KeyID: 0xEAF45FB8]
http://web.itu.edu.tr/uyar/


--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] IMDbPY featured on Sourceforge's blog

2010-03-10 Thread H. Turgut Uyar
On 03/10/2010 07:19 PM, Davide Alberani wrote:
> today IMDbPY was featured on the Sourceforge's blog:
>   http://sourceforge.net/blog/imdbpy-projects-imdbcom-data-onto-your-screen/
> 

That's very nice :) Cheers!

Let's hope this will attract more developers.

-- 
H. Turgut Uyar  [GPG KeyID: 0xEAF45FB8]
http://www3.itu.edu.tr/~uyar/


--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] changes to imdb.com

2009-11-14 Thread H. Turgut Uyar
On 11/14/2009 06:16 PM, Davide Alberani wrote:
> Right now I'm not sure about how to proceed: leave the old set
> of parsers and release 4.3 ASAP, or wait until the new pages
> are completely deployed.
> Hints?
> 

I think that for an application which uses IMDbPY, it is important that
the main parsers are always working. So my preference would be to have a
new version of IMDbPY as soon as possible, no matter how small the changes.

> By the way, what kind of pages do you see?  Is the list of
> movies sortable by year/rating?  E.g.:
>   http://akas.imdb.com/name/nm634/maindetails
> 

Yes.

--
Turgut Uyar


--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] five years of IMDbPY!

2009-04-01 Thread H. Turgut Uyar
Many happy returns :-)

On 04/01/2009 02:17 PM, ori cohen wrote:
> heh awesome :) beer for everyone..
> 
> Ok, sharing the birthday with GMail is not easy, but exactly five
> years ago IMDbPY 1.0 was released, too. :-)
> And I can notice, with a bit of pride, that - _30_ releases later - the
> main API is still the same and the project is alive and kicking. :-)
> 
> So, thanks to anyone who contributed with code, patches, bug reports
> and so on!  Let's celebrate! ;-)

-- 
Turgut Uyar


--
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] BUG: 4.0-dev - Newline Missing after Genres and incorrect plot summary

2009-03-25 Thread H. Turgut Uyar
On 03/25/2009 06:07 PM, H. Turgut Uyar wrote:
>> Is there an easy way to translate the XML
>> generated by the XSL file?
>>
> 
> It should be easy, an XML book I have says to put an entry for every
> language
> 

That proved to be too naive of me. From what I understand, things do not
run automagically as I hoped they would.

Anyway, I committed a patch to do the translations using XSLT. I
arranged it so that the i18n entry in the config file (and therefore the
LANG environment variable) will be the single control point for
activating the translations of both the static HTML code and the labels
that will come from imdbpy. Since I can not find a way of getting the
value of an environment variable in XSLT, I added some code that will
create a file accordingly to be used by XSLT. It looks kinda awkward but
it works.

-- 
Turgut


--
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] BUG: 4.0-dev - Newline Missing after Genres and incorrect plot summary

2009-03-25 Thread H. Turgut Uyar
On 03/25/2009 05:36 PM, Davide Alberani wrote:
> On Mar 25, "H. Turgut Uyar"  wrote:
>> Maybe we should set the dependency info in the setup.py file so
>> that a release with version greater than 2.1 will be selected.
> 
> For sure it won't hurt. :-)
> 

Greater than 2.1 might be overkill but I'm a bit lazy to find out which
exact version of lxml satisfies all the features we use :-)

> PS: in a matter of days, I hope to add "top 250/bottom 100" links
> to imdbpykit.

Great.

> Whata about i18n?  I see a file for English and one for Turkish;
> I can add Italian.

English and Turkish are also nearly empty. The problem is that we have
to generate a list of XML tags and that can be quite laborious. gettext
utils can not help us either because the tags are generated in the code,
so we can't mark them anywhere like _(..) for a utility to collect.

> Is there an easy way to translate the XML
> generated by the XSL file?
> 

It should be easy, an XML book I have says to put an entry for every
language as in:

IMDbPY gateway
IMDbPY ag gecidi

And the browser should select the correct one.

-- 
Turgut


--
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] BUG: 4.0-dev - Newline Missing after Genres and incorrect plot summary

2009-03-25 Thread H. Turgut Uyar
On 02/27/2009 08:42 AM, Chris Thompson wrote:
> Regarding the lxml package, I have "python-lxml version 1.3.6-1"
> installed and "libxml2 version 2.6.31.dfsg-2ubuntu1.3". This is detected
> by the imdbpy installer but I still get the warnings. Here is a snippet
> from my console:
> 
> 
> r...@zen-linux:/data/downloads/development/general/python/imdmpy/trunk/imdbpy#
> get_first_movie.py alien
> 
> /usr/lib/python2.5/site-packages/IMDbPY-4.0dev-py2.5.egg/imdb/parser/http/utils.py:363:
> UserWarning: unable to use "lxml": No module named html
>   warnings.warn('unable to use "%s": %s' % (mod, str(e)))
> 
> /usr/lib/python2.5/site-packages/IMDbPY-4.0dev-py2.5.egg/imdb/parser/http/utils.py:354:
> UserWarning: falling back to "beautifulsoup".
>   warnings.warn('falling back to "%s".' % mod)
> 

I was browsing the archives and noticed that this was not answered. The
1.3.6 version of python-lxml does not seem to have an "html" module.
Maybe we should set the dependency info in the setup.py file so that a
release with version greater than 2.1 will be selected.

-- 
Turgut Uyar


--
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] IMDbPY 3.7 released

2008-09-22 Thread H. Turgut Uyar
On 09/22/2008 12:51 PM, Davide Alberani wrote:
> Believe it or not, IMDbPY 3.7 is here! :-)
> As usual, you can download everything from http://imdbpy.sf.net/
> 
> In this release the html parsers were replaced with new DOM/XPath-based
> parsers, mostly based on the work of H. Turgut Uyar, who I personally
> thank for the impressive amount and quality of work.
> 

Thanks a lot. I'm glad to contribute to the project.

Turgut


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help