Re: converting to/from char[]/string

2020-03-05 Thread mark via Digitalmars-d-learn

On Thursday, 5 March 2020 at 13:31:14 UTC, Adam D. Ruppe wrote:

On Thursday, 5 March 2020 at 11:03:30 UTC, mark wrote:

I want to use the Porter stemming algorithm.
There's a D implementation here: 
https://tartarus.org/martin/PorterStemmer/d.txt


I think I (or ketmar and I stole it from him) ported that very 
same file before:


https://github.com/adamdruppe/adrdox/blob/master/stemmer.d

By just adding `const` where appropriate it becomes compatible 
with string and you can slice to take care of the size thing.


https://github.com/adamdruppe/adrdox/blob/master/stemmer.d#L512

is that stem function as a const slice


I thought the problem was using char[] rather than dchar[], but 
evidently not.


I downloaded yours and it "just works": I didn't have to change 
anything. (dscanner gives a couple of const/immutable hints which 
I'll fix, but still.)


Might be good to ask to add yours to 
https://tartarus.org/martin/PorterStemmer/ since it works and the 
old one doesn't.


Thank you!


Re: converting to/from char[]/string

2020-03-05 Thread Adam D. Ruppe via Digitalmars-d-learn

On Thursday, 5 March 2020 at 11:03:30 UTC, mark wrote:

I want to use the Porter stemming algorithm.
There's a D implementation here: 
https://tartarus.org/martin/PorterStemmer/d.txt


I think I (or ketmar and I stole it from him) ported that very 
same file before:


https://github.com/adamdruppe/adrdox/blob/master/stemmer.d

By just adding `const` where appropriate it becomes compatible 
with string and you can slice to take care of the size thing.


https://github.com/adamdruppe/adrdox/blob/master/stemmer.d#L512

is that stem function as a const slice


Re: converting to/from char[]/string

2020-03-05 Thread mark via Digitalmars-d-learn
I suspect the problem is using .length rather than some other 
size property.


Re: converting to/from char[]/string

2020-03-05 Thread mark via Digitalmars-d-learn

I changed int to size_t and used const(char[]) etc. as suggested.
It ran but crashed. Each crash was a range violation, so for each 
one I put in a guard so instead of


if ( ... m_b[m_k])

I used

if (m_k < m_b.length && ... m_b[m_k)

I did this kind of fix in three places.

The result is that it does some but not all the stemming!

Anyway, I'll compare it with the Python version and see if I can 
spot the problem(s).


Thanks.


Re: converting to/from char[]/string

2020-03-05 Thread Dennis via Digitalmars-d-learn

On Thursday, 5 March 2020 at 11:31:43 UTC, mark wrote:
I've now got Martin Porter's own Java version, so I'll have a 
go at porting that to D myself.


I don't think that's necessary, the errors seem easy to fix.

src/porterstemmer.d(197,13): Error: cannot implicitly convert 
expression s.length of type ulong to int
src/porterstemmer.d(222,9): Error: cannot implicitly convert 
expression cast(ulong)this.m_j + s.length of type ulong to int


These errors are probably because the code was only compiled on 
32-bit targets where .length is of type `uint`, but you are 
compiling on 64-bit where .length is of type `ulong`.
A quick fix is to simply cast the result like `cast(int) 
s.length` and `cast(int) (this.m_j + s.length)`, though a proper 
fix would be to change the types of variables to `long`, 
`size_t`, `auto` or `const` (depending on which is most 
appropriate).


src/porterstemmer.d(259,12): Error: function 
porterstemmer.PorterStemmer.ends(char[] s) is not callable 
using argument types (string)
src/porterstemmer.d(259,12):cannot pass argument "sses" 
of type string to parameter char[] s


These errors are because `string` is `immutable(char)[]`, meaning 
the characters may not be modified, while the function accepts a 
`char[]` which is allowed to mutate the characters.
I don't think the functions actually do that, so you can simply 
change `char[]` into `const(char)[]` so a string can be passed to 
those functions.


Re: converting to/from char[]/string

2020-03-05 Thread mark via Digitalmars-d-learn

On Thursday, 5 March 2020 at 11:12:24 UTC, drug wrote:

On 3/5/20 2:03 PM, mark wrote:

[snip]

Your code and errors seem to be not related.


OK, it is probably that the D stemmer is 19 years old!

I've now got Martin Porter's own Java version, so I'll have a go 
at porting that to D myself.


Re: converting to/from char[]/string

2020-03-05 Thread drug via Digitalmars-d-learn

On 3/5/20 2:03 PM, mark wrote:

I want to use the Porter stemming algorithm.
There's a D implementation here: 
https://tartarus.org/martin/PorterStemmer/d.txt


The main public function's signature is:

char[] stem(char[] p, int i, int j)

But I work entirely in terms of strings (containing individual words), 
so I want to add another function with this signature:


string stem(string word)

I've tried this without success:

     public string stem(string word) {
     import std.conv: to;

     char[] chars = word.to!char[];
     int end = chars.length.to!int; >      return stem(chars, 0, 
end).to!string;
     }

Here are just a few of the errors:

src/porterstemmer.d(197,13): Error: cannot implicitly convert expression 
s.length of type ulong to int
src/porterstemmer.d(222,9): Error: cannot implicitly convert expression 
cast(ulong)this.m_j + s.length of type ulong to int
src/porterstemmer.d(259,12): Error: function 
porterstemmer.PorterStemmer.ends(char[] s) is not callable using 
argument types (string)
src/porterstemmer.d(259,12):    cannot pass argument "sses" of type 
string to parameter char[] s



Your code and errors seem to be not related.


converting to/from char[]/string

2020-03-05 Thread mark via Digitalmars-d-learn

I want to use the Porter stemming algorithm.
There's a D implementation here: 
https://tartarus.org/martin/PorterStemmer/d.txt


The main public function's signature is:

char[] stem(char[] p, int i, int j)

But I work entirely in terms of strings (containing individual 
words), so I want to add another function with this signature:


string stem(string word)

I've tried this without success:

public string stem(string word) {
import std.conv: to;

char[] chars = word.to!char[];
int end = chars.length.to!int;
return stem(chars, 0, end).to!string;
}

Here are just a few of the errors:

src/porterstemmer.d(197,13): Error: cannot implicitly convert 
expression s.length of type ulong to int
src/porterstemmer.d(222,9): Error: cannot implicitly convert 
expression cast(ulong)this.m_j + s.length of type ulong to int
src/porterstemmer.d(259,12): Error: function 
porterstemmer.PorterStemmer.ends(char[] s) is not callable using 
argument types (string)
src/porterstemmer.d(259,12):cannot pass argument "sses" 
of type string to parameter char[] s