On Sun, 25 May 2008 18:42:06 -0700, notnorwegian wrote:
> def scrapeSites(startAddress):
> site = startAddress
> sites = set()
> iterator = iter(sites)
> pos = 0
> while pos < 10:#len(sites):
> newsites = scrapeSite(site)
> joinSets(sites, newsites)
You change t
En Sun, 25 May 2008 22:42:06 -0300, <[EMAIL PROTECTED]> escribió:
> def joinSets(set1, set2):
> for i in set2:
> set1.add(i)
> return set1
Use the | operator, or |=
> Traceback (most recent call last):
> File "C:/Python25/Progs/WebCrawler/spider2.py", line 47, in
> x = scr
On May 25, 8:02 pm, Rodrigo Lazo <[EMAIL PROTECTED]> wrote:
> what about heapq for sorting?
Heap is the data structure to use for 'fast (nearly) sorted inserts'.
But heapq do not support (as far as I know) deletion of duplicates.
But a custom heap class coud do that of course.
--
http://mail.pyt
Traceback (most recent call last):
File "C:/Python25/Progs/WebCrawler/spider2.py", line 47, in
x = scrapeSites("http://www.yahoo.com";)
File "C:/Python25/Progs/WebCrawler/spider2.py", line 31, in
scrapeSites
site = iterator.next()
RuntimeError: Set changed size during iteration
def j
On 26 Maj, 03:04, [EMAIL PROTECTED] wrote:
> On 26 Maj, 01:30, I V <[EMAIL PROTECTED]> wrote:
>
>
>
> > On Sun, 25 May 2008 15:49:16 -0700, notnorwegian wrote:
> > > i meant like set[pos], not iterate but access a specific position in the
> > > set.
>
> > If you need to access arbitrary elements, u
On 26 Maj, 01:30, I V <[EMAIL PROTECTED]> wrote:
> On Sun, 25 May 2008 15:49:16 -0700, notnorwegian wrote:
> > i meant like set[pos], not iterate but access a specific position in the
> > set.
>
> If you need to access arbitrary elements, use a list instead of a set
> (but you'll get slower inserts
On May 25, 2:37 am, [EMAIL PROTECTED] wrote:
> im writing a webcrawler.
> after visiting a new site i want to store it in alphabetical order.
>
> so obv i want fast insert. i want to delete duplicates too.
>
> which datastructure is best for this?
I think you ought to re-examine your requirements.
On Sun, 25 May 2008 15:49:16 -0700, notnorwegian wrote:
> i meant like set[pos], not iterate but access a specific position in the
> set.
If you need to access arbitrary elements, use a list instead of a set
(but you'll get slower inserts). OTOH, if you just need to be able to get
the next item
On May 25, 9:32 am, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote:
> On Sun, 25 May 2008 00:10:45 -0700, notnorwegian wrote:
> > sets dont seem to be so good because there is no way to iterate them.
>
> Err:
>
> In [82]: for x in set(['a', 'b', 'c']):
> : print x
> :
> a
> c
On Sun, 25 May 2008 13:05:31 -0300, Gabriel Genellina wrote:
> Use a list, and the bisect module to keep it sorted:
That's worth doing if you need the data to be sorted after each insert.
If the OP just needs the data to be sorted at the end, using a data
structure with fast inserts (like a set)
Stefan Behnel <[EMAIL PROTECTED]> writes:
> [EMAIL PROTECTED] wrote:
>> im writing a webcrawler.
>> after visiting a new site i want to store it in alphabetical order.
>>
>> so obv i want fast insert. i want to delete duplicates too.
>>
>> which datastructure is best for this?
>
> Keep the data
[EMAIL PROTECTED] wrote:
> im writing a webcrawler.
> after visiting a new site i want to store it in alphabetical order.
>
> so obv i want fast insert. i want to delete duplicates too.
>
> which datastructure is best for this?
Keep the data redundantly in two data structures. Use collections.de
"Rares Vernica" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
| >>> l=list(s)
| >>> l.sort()
This can be condensed to l = sorted(s)
| >>> l
| ['a', 'b', 'c']
--
http://mail.python.org/mailman/listinfo/python-list
En Sun, 25 May 2008 03:37:00 -0300, <[EMAIL PROTECTED]> escribió:
> im writing a webcrawler.
> after visiting a new site i want to store it in alphabetical order.
>
> so obv i want fast insert. i want to delete duplicates too.
>
> which datastructure is best for this?
Use a list, and the bisect m
On Sun, May 25, 2008 at 3:10 AM, <[EMAIL PROTECTED]> wrote:
>
> > >>> l=list(s)
> > >>> l.sort()
> > >>> l
> >
> > ['a', 'b', 'c']
> >
> > hth,
> > Rares
>
> sets dont seem to be so good because there is no way to iterate them.
>
> s.pop() remove and return an arbitrary element fro
On Sun, 25 May 2008 00:10:45 -0700, notnorwegian wrote:
> sets dont seem to be so good because there is no way to iterate them.
Err:
In [82]: for x in set(['a', 'b', 'c']):
: print x
:
a
c
b
Ciao,
Marc 'BlackJack' Rintsch
--
http://mail.python.org/mailman/listinfo/pyth
On 25 Maj, 08:56, Rares Vernica <[EMAIL PROTECTED]> wrote:
> use a set to store them:
>
> >>> s=set()
> >>> s.add('a')
> >>> s.add('b')
> >>> s
> set(['a', 'b'])
> >>> s.add('a')
> >>> s
> set(['a', 'b'])
> >>> s.add('c')
> >>> s
>
> set(['a', 'c', 'b'])
>
>
>
> it does remove duplicates, but is it
use a set to store them:
>>> s=set()
>>> s.add('a')
>>> s.add('b')
>>> s
set(['a', 'b'])
>>> s.add('a')
>>> s
set(['a', 'b'])
>>> s.add('c')
>>> s
set(['a', 'c', 'b'])
>>>
it does remove duplicates, but is it not ordered. to order it you can
use:
>>> l=list(s)
>>> l.sort()
>>> l
['a', 'b', 'c']
im writing a webcrawler.
after visiting a new site i want to store it in alphabetical order.
so obv i want fast insert. i want to delete duplicates too.
which datastructure is best for this?
--
http://mail.python.org/mailman/listinfo/python-list
19 matches
Mail list logo