Oops. I made a mistake in the experiment.  There is actually less  
difference than I thought.

Here we load a web site, optionally using split and join to remove all  
comments.  My regex version seems to be only marginally worse than  
Keith's sequence splitting.


5289 "ON split on regex"
5327

5165 "KH split on sequence"
5160

2153 "no splitting"
2160

So regex splitting seems to be feasible.

I can try to have a closer look and propose a merged solution, but  
right now my plate is rather full.

- on


On Apr 5, 2009, at 18:14, Oscar Nierstrasz wrote:

>
> About performance:
>
> I just did a quick experiment in the pier migration application where
> I need split and join.
>
> I use split and join to remove comments from HTMl files.  I ran the
> tests without removing comments, and removing them using the two
> different split/join implementations.
>
> Keith's sequence splitter is blindingly fast, imposing no discernable
> overhead, whereas my regex version slows all the tests down by 100%!
>
> I would still like to have splitting on regexes, but it should
> probably not be the default for strings.  Maybe we can improve the
> implementation and speed it up ...
>
> - on
>
> On Apr 5, 2009, at 18:03, Oscar Nierstrasz wrote:
>
>>
>> With Keith's version you can do this:
>>
>> #(1 10 11 2 10 11 3 10 11 4) splitOn: #(10 11)
>>
>> I was assuming that the thing we use to split was a regex string.
>>
>> 'hello there' split: '\s'
>>
>> Actually I see that Damien added this possibility in RubyShards as
>> well.  This also works:
>>
>> #(1 10 11 2 10 11 3 10 11 4) split: #(10 11)
>>
>> It seems that RubyShards is more general, but we need to take a  
>> closer
>> look at both solutions.  The interfaces are not the same.  There may
>> be differences in performance.
>>
>> - on
>>
>>
>> On Apr 5, 2009, at 17:47, Stéphane Ducasse wrote:
>>
>>> I would be in favor to have a nice oo solution :)
>>> I do not know what means "uses a sequence  to split a sequence."
>>>
>>> Stef
>>>
>>>> OK, I had a closer look.
>>>>
>>>> Keith's implementation is completely different from, and pre-dates,
>>>> that of Damien and myself.
>>>>
>>>> Keith's version works for SequenceableCollections, and uses a
>>>> sequence
>>>> to split a sequence.
>>>>
>>>> Ours is more tailored towards Strings, and uses a regex to split a
>>>> String.
>>>>
>>>> Perhaps we can consider a merge in which sequences can be split
>>>> using
>>>> sequences, and Strings can additionally be split using regexes.
>>>>
>>>> We should also take efficiency into account.  I did not run any
>>>> benchmarks yet to compare the implementations
>>>>
>>>> Who is interested in merging these two?
>>>>
>>>> Cheers,
>>>> - on
>>>>
>>>> On Apr 5, 2009, at 16:25, Oscar Nierstrasz wrote:
>>>>
>>>>>
>>>>> Hi Keith,
>>>>>
>>>>> Now I see there are attached files in Mantis.  But they all seem  
>>>>> to
>>>>> date from 2006, whereas your latest comments are  from Jan 2009.
>>>>> Are
>>>>> there more recent files from 2009 that I should look at?  If so,
>>>>> where
>>>>> are they?
>>>>>
>>>>> What is the best way to proceed?  Shall  I create a Join project  
>>>>> on
>>>>> SqueakSource, and if it is updated, post the latest version on
>>>>> Mantis
>>>>> too?
>>>>>
>>>>> Cheers,
>>>>> - on
>>>>>
>>>>> On Apr 5, 2009, at 16:08, Keith Hodges wrote:
>>>>>
>>>>>> Stéphane Ducasse wrote:
>>>>>>>> I wrote the split join implementation that is available on
>>>>>>>> mantis
>>>>>>>>
>>>>>>>> http://bugs.squeak.org/view.php?id=4874
>>>>>>>>
>>>>>>>> I use it all the time, if you would like to improve on what is
>>>>>>>> there, please continue to contribute to the mantis page
>>>>>>>> discussion/
>>>>>>>> tests and code. That way we will get an polished implementation
>>>>>>>> that
>>>>>>>> can be added to squeak or to pharo.
>>>>>>>>
>>>>>>>> The suggestion to use #species would be fine (I never use
>>>>>>>> species
>>>>>>>> myself, because I dont understand what its really for).
>>>>>>>>
>>>>>>>
>>>>>>> or class
>>>>>>> the point is that you get back a collection of the same kind of
>>>>>>> the
>>>>>>> receiver
>>>>>>>
>>>>>>>> When stef says "I have checked the code and it looks nice" he
>>>>>>>> didnt
>>>>>>>> say which code he checked, so I am confused.
>>>>>>>>
>>>>>>>
>>>>>>> I looked at the latest version in the repository mentioned by
>>>>>>> oscar
>>>>>>> rubyshards
>>>>>>>
>>>>>>>
>>>>>> Which appears to me to be the opposite of what Oscar suggested.
>>>>>> If I
>>>>>> read the email, he asked what the status of mantis 4874 was,
>>>>>> anticipating that it be integrated. He had "gone back" to ruby
>>>>>> shards in
>>>>>> the absense of the integration of 4784.
>>>>>>
>>>>>> Keith
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pharo-project mailing list
>>>>>> [email protected]
>>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-
>>>>>> project
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pharo-project mailing list
>>>>> [email protected]
>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo- 
>>>>> project
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Pharo-project mailing list
>>>> [email protected]
>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>>
>>>
>>>
>>> _______________________________________________
>>> Pharo-project mailing list
>>> [email protected]
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>>
>>
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [email protected]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>>
>
>
> _______________________________________________
> Pharo-project mailing list
> [email protected]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

Kind regards,
Oscar Nierstrasz
---
Prof. Dr. O. Nierstrasz    -- [email protected]
Software Composition Group -- http://www.iam.unibe.ch/~scg
University of Bern         -- Tel/Fax +41 31 631.4618/3355
vcard:  http://www.iam.unibe.ch/~oscar/oscarNierstrasz.vcf


_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Reply via email to