Re: [PD-dev] strings
On Mon, 2006-12-18 at 12:46 -0500, Mathieu Bouchard wrote: of course the only real way to vote for this would be write the code - i think i'll wait for PNPD instead.. :) pnpd is currently supporting both hashed symbols and full-featured string ;) however, there are no objects for handling strings, yet Are there any implicit casts between strings and symbols? i haven't decided, yet, but i guess no ... -- [EMAIL PROTECTED]ICQ: 96771783 http://www.mokabar.tk The only people for me are the mad ones, the ones who are mad to live, mad to talk, mad to be saved, desirous of everything at the same time, the ones who never yawn or say a commonplace thing, but burn, burn, burn, like fabulous yellow roman candles exploding like spiders across the stars and in the middle you see the blue centerlight pop and everybody goes Awww! Jack Kerouac signature.asc Description: This is a digitally signed message part ___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: [PD-dev] strings
of course the only real way to vote for this would be write the code - i think i'll wait for PNPD instead.. :) pnpd is currently supporting both hashed symbols and full-featured string ;) however, there are no objects for handling strings, yet t -- [EMAIL PROTECTED]ICQ: 96771783 http://www.mokabar.tk All we composers really have to work with is time and sound - and sometimes I'm not even sure about sound. Morton Feldman signature.asc Description: This is a digitally signed message part ___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: [PD-dev] strings
On Sun, 17 Dec 2006, Martin Peach wrote: but in the long term it would be best to just use long lengths for when we all have teraflop laptops: People have been using strings bigger than 64k for many years in almost any other language. It doesn't have anything to do with the teraflops, really. _ _ __ ___ _ _ _ ... | Mathieu Bouchard - tél:+1.514.383.3801 - http://artengine.ca/matju | Freelance Digital Arts Engineer, Montréal QC Canada___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: [PD-dev] strings
On Mon, 18 Dec 2006, Hans-Christoph Steiner wrote: On Dec 17, 2006, at 1:36 AM, Mathieu Bouchard wrote: That's aiming low. Why shouldn't there be any automatic casts between the two? Automatic type conversion sounds like a really bad idea if the language only partially supports it. If that's the case then pd is a really bad idea. It's not possible to typecast any value of a type to any value of another type, all of the time. That's even true in the most typecast-frenzy languages ever. There's no such requirement that implicit casts have to be impossibly well supported in order to be a good idea. You're just dismissing all forms of implicit casts as being bad ideas. Pd is strongly typed, so what Martin says is definitely appropriate. Non-sequitur, there are languages that have quite strict and elaborate type checking and yet which support implicit casts. For example, C++. And then, in many cases, languages can become less strictly checked without any problems, as long as nothing actually relies on type violations. Because pd doesn't have any error handling (in the sense of pd patches being able to figure out their own problems), if any patch doesn't spit out any error messages, it would run the same if there were more implicit casts made, because the patches would never trigger those implicit casts anyway. Perl is the opposite, everything can be automatically cast, so there it makes sense. No, in Perl there's no way that you can cast something to a pointer. You can use the backslash operator to make a pointer to anything, but that's not a cast-to-pointer, because if you use it on a pointer, you don't get the same pointer, you get a new pointer. So... anyone want to code up some of these ideas? We could try them out in the next Pd-extended. Martin's solution, my solution, or your solution? _ _ __ ___ _ _ _ ... | Mathieu Bouchard - tél:+1.514.383.3801 - http://artengine.ca/matju | Freelance Digital Arts Engineer, Montréal QC Canada___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: [PD-dev] strings
On Dec 18, 2006, at 1:23 AM, carmen wrote: Automatic type conversion sounds like a really bad idea if the language only partially supports it. Pd is strongly typed is it? it mainly has numbers that occasionally look like symbols, and symbols that more than occasionally look like lists and/or strings.. There are set rules which defined what is a float, symbol, or pointer. You cannot change that type, often even with a special method. Ever tried to turn a float into a symbol? Doesn't really work, only partially. .hc , so what Martin says is definitely appropriate. Perl is the opposite, everything can be automatically cast, so there it makes sense. it is definitely a design decision which way to go. could PD flexibly support both at once? or does there need to be an OCaml edition, and a Perl edition? ___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev Mistrust authority - promote decentralization. - the hacker ethic ___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: Re: [PD-dev] strings
On Mon, 18 Dec 2006, [EMAIL PROTECTED] wrote: Mathieu Bouchard [EMAIL PROTECTED] wrote: I have no clue what you're talking about: how mangled would they be? i don't plan any mangling to happen, except for the presence of \0 characters. Maybe you don't understand what is being proposed. How would you make a symbol containing ASCII NUL and CR LF characters for instance? Do you realise that the quoting problem can be solved independently of the allocation problem? In that case, you would be able to save any symbol and read it back. This would solve the problem about CR LF and spaces; only the problem with \0 (NUL) would remain. Do you also realise that symbols can be made to support NUL while being backwards-compatible? Then what happens when a non-NUL-supporting external tries to read a symbol that contains a NUL, it will appear truncated at that point and that's all. So basically there are three problems that can be dealt with independently. I'd rather not suppose that all three have to come together, monolithically. For this purpose symbols are not usable because they can't contain every possible character and lists have too much overhead since each element of a list is an atom. Symbols could be usable, if the problems that can be fixed in symbol without changing the nature of symbols, are fixed. You don't need strings for that. I'm suggesting that a [string] be like any other object and be deallocated when the patcher is closed. Ok, that's certainly the string feature that I want. It's too much trouble for the benefit. Whatever. Wouldn't you want objects to be able to emit strings in a way as carefree as they are with symbols? I'm talking about not putting the burden of memory management on the emitter of strings. Man, that's not n atom type. No it's not n atoms, It's a typo. It was supposed to be that's not an atom type, but that isn't so more true. I would've like to say something more like: it would be easier, if strings are more similar to symbols and floats, than to (g)pointers. Symbols are difficult to work with because their content gets interpreted, You say that in answer to my questions on allocation? (That's not an allocation issue and not even any kind of memory layout issue.) I don't know, did I? It looks to me like an answer to a question about why symbols can't be used to encode arbitrary strings. Maybe I was tired. It was just below two paragraphs that I had written about allocation. for example if I write a comment MP 20061214 it gets converted into MP 2.00612e+007 the contents of a comment box is not a symbol. It's a list of atoms. However, Pd has the same problem you describe when trying to save some I wonder how the list of atoms in a comment box gets by without some of those atoms being symbols... That some of the components are symbols, has nothing to do with the reason 20061214 gets converted to float32. There is never a big symbol containing all of the contents of a comment: it's broken down into atoms (into a t_binbuf) as soon as you click outside of the box, it's just the t_rtext that keeps holding the original string; but the t_savefn only saves the t_binbuf, it doesn't look at the t_rtext, which is why the floats get mangled at save time only. _ _ __ ___ _ _ _ ... | Mathieu Bouchard - tél:+1.514.383.3801 - http://artengine.ca/matju | Freelance Digital Arts Engineer, Montréal QC Canada___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: Re: [PD-dev] strings
On Mon, 18 Dec 2006, [EMAIL PROTECTED] wrote: If ascii values from 0 - 31 can be part of symbols that would be nice. How do you specify a symbol containing ascii values 1 2 and 3? Do they have names? Do it the way most languages have borrowed from C : use backslash followed by an octal or hex code, like \033 or \0x1b. It's easy to make it compatible with C/C++/Java/Perl/Python/Tcl/Ruby/PHP, all at the same time. Symbols could be usable, if the problems that can be fixed in symbol without changing the nature of symbols, are fixed. You don't need strings for that. You still have the problem of the symbol table that grows by one each time the symbol changes. Well, you can solve that problem separately. There's no point to clump all issues into one ball. Wouldn't you want objects to be able to emit strings in a way as carefree as they are with symbols? I'm talking about not putting the burden of memory management on the emitter of strings. A string library could have functions similar to getbytes(), resizebytes() and freebytes() Yes, that's putting the burden of memory management on the emitter. I mean, what I call the burden of memory management isn't about writing your own malloc() from scratch, no, I'm not talking about that, I'm talking about having to decide when to copy and when to deallocate. _ _ __ ___ _ _ _ ... | Mathieu Bouchard - tél:+1.514.383.3801 - http://artengine.ca/matju | Freelance Digital Arts Engineer, Montréal QC Canada___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: Re: [PD-dev] strings
Mathieu Bouchard [EMAIL PROTECTED] wrote: Do you realise that the quoting problem can be solved independently of the allocation problem? In that case, you would be able to save any symbol and read it back. This would solve the problem about CR LF and spaces; only the problem with \0 (NUL) would remain. If ascii values from 0 - 31 can be part of symbols that would be nice. How do you specify a symbol containing ascii values 1 2 and 3? Do they have names? Symbols could be usable, if the problems that can be fixed in symbol without changing the nature of symbols, are fixed. You don't need strings for that. You still have the problem of the symbol table that grows by one each time the symbol changes. If I want to parse a book one word at a time, for example, it would only take one string for the input buffer, but it would take as many symbols as there are different words in the book. Wouldn't you want objects to be able to emit strings in a way as carefree as they are with symbols? I'm talking about not putting the burden of memory management on the emitter of strings. A string library could have functions similar to getbytes(), resizebytes() and freebytes() for changing the length of strings that could be called by any other external in the library. Or pd could have the same functions that could be called by any external. Either way... Martin ___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: Re: [PD-dev] strings
De: Hans-Christoph Steiner [EMAIL PROTECTED] Date: 2006/12/18 lun. AM 09:45:26 GMT-05:00 À: carmen [EMAIL PROTECTED] Cc: pd-dev@iem.at Objet: Re: [PD-dev] strings On Dec 18, 2006, at 1:23 AM, carmen wrote: Automatic type conversion sounds like a really bad idea if the language only partially supports it. Pd is strongly typed is it? it mainly has numbers that occasionally look like symbols, and symbols that more than occasionally look like lists and/or strings.. There are set rules which defined what is a float, symbol, or pointer. You cannot change that type, often even with a special method. Ever tried to turn a float into a symbol? Doesn't really work, only partially. Along the lines of pd_defaultlist() in m_class.c, which handles list messages for objects that don't have list methods, one could add a pd_defaultstring(), which attempts to convert strings into symbols/floats/lists, instead of calling pd_defaultanything(), which would print no method for string. But it needs to be understood that it might not do it correctly, which is Not A Good Thing, but no worse than comments getting mangled. Maybe a [string unpack] object would be better: it could attempt to unpack a string into specified types, so the user could decide if a string like 123 is meant to represent a float or a symbol. Martin .hc , so what Martin says is definitely appropriate. Perl is the opposite, everything can be automatically cast, so there it makes sense. it is definitely a design decision which way to go. could PD flexibly support both at once? or does there need to be an OCaml edition, and a Perl edition? ___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev Mistrust authority - promote decentralization. - the hacker ethic ___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev ___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: Re: [PD-dev] strings
De: Mathieu Bouchard [EMAIL PROTECTED] Date: 2006/12/18 lun. PM 12:11:18 GMT-05:00 À: Martin Peach [EMAIL PROTECTED] Cc: pd-dev@iem.at Objet: Re: [PD-dev] strings On Sun, 17 Dec 2006, Martin Peach wrote: You make them work as strings when they can, and You make them work as symbols when they must. There would be two objects, [stringtosymbol] and [symboltostring] that you could put between string and symbol objects. Of course some strings would get impossibly mangled this way but that's because of the way symbols work. I have no clue what you're talking about: how mangled would they be? i don't plan any mangling to happen, except for the presence of \0 characters. Maybe you don't understand what is being proposed. How would you make a symbol containing ASCII NUL and CR LF characters for instance? Yes, there's no reason not to have 0-length strings. And no reason to trash them when they are unused either, since they don't take up more space than any other object. They take the space it takes to tell their size and the pointer to the buffer. That's significant, and nearly as much as in the case of a t_symbol, supposing that those t_strings can live independently of the objects that produce them. Like any other object strings have that overhead, but unlike lists they only have one atom per string. They would be created by string objects and last as long as the string objects. One string per string object. String messages are passed between string manipulator objects. For this purpose symbols are not usable because they can't contain every possible character and lists have too much overhead since each element of a list is an atom. I'm suggesting that a [string] be like any other object and be deallocated when the patcher is closed. Ok, that's certainly the string feature that I want. It's too much trouble for the benefit. Whatever. Man, that's not n atom type. No it's not n atoms, it's a single atom that contains a pointer to a list of bytes. That's the main advantage of string over list. Symbols are difficult to work with because their content gets interpreted, You say that in answer to my questions on allocation? (That's not an allocation issue and not even any kind of memory layout issue.) I don't know, did I? It looks to me like an answer to a question about why symbols can't be used to encode arbitrary strings. Maybe I was tired. for example if I write a comment MP 20061214 it gets converted into MP 2.00612e+007 the contents of a comment box is not a symbol. It's a list of atoms. However, Pd has the same problem you describe when trying to save some symbols. e.g. say you have a symbol with a space in it and you pass it to a messagebox set $1 which passes it to an empty messagebox, and then you save the patch: then you have that problem with symbols. But the contents of the comment box has that problem while never storing its contents as a symbol. I wonder how the list of atoms in a comment box gets by without some of those atoms being symbols... Martin ___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: [PD-dev] strings
On Dec 18, 2006, at 12:42 PM, Mathieu Bouchard wrote: On Mon, 18 Dec 2006, Hans-Christoph Steiner wrote: On Dec 17, 2006, at 1:36 AM, Mathieu Bouchard wrote: That's aiming low. Why shouldn't there be any automatic casts between the two? Automatic type conversion sounds like a really bad idea if the language only partially supports it. If that's the case then pd is a really bad idea. It's not possible to typecast any value of a type to any value of another type, all of the time. That's even true in the most typecast-frenzy languages ever. There's no such requirement that implicit casts have to be impossibly well supported in order to be a good idea. You're just dismissing all forms of implicit casts as being bad ideas. Pd is strongly typed, so what Martin says is definitely appropriate. Non-sequitur, there are languages that have quite strict and elaborate type checking and yet which support implicit casts. For example, C++. And then, in many cases, languages can become less strictly checked without any problems, as long as nothing actually relies on type violations. Because pd doesn't have any error handling (in the sense of pd patches being able to figure out their own problems), if any patch doesn't spit out any error messages, it would run the same if there were more implicit casts made, because the patches would never trigger those implicit casts anyway. C/C++ is not very strict. It allows you to just change what you call a chunk of memory without complaint. Try Pascal, that is strict. Or Pd's floats and symbols. There is no way to make a float into symbol or vice versa without trickery. For example, these doesn't work: [123([symbol 123( [symbol 123( | | | [symbol] [ \ -- (symbol box) [float] .hc Perl is the opposite, everything can be automatically cast, so there it makes sense. No, in Perl there's no way that you can cast something to a pointer. You can use the backslash operator to make a pointer to anything, but that's not a cast-to-pointer, because if you use it on a pointer, you don't get the same pointer, you get a new pointer. So... anyone want to code up some of these ideas? We could try them out in the next Pd-extended. Martin's solution, my solution, or your solution? _ _ __ ___ _ _ _ ... | Mathieu Bouchard - tél:+1.514.383.3801 - http://artengine.ca/matju | Freelance Digital Arts Engineer, Montréal QC Canada Terrorism is not an enemy. It cannot be defeated. It's a tactic. It's about as sensible to say we declare war on night attacks and expect we're going to win that war. We're not going to win the war on terrorism.- retired U.S. Army general, William Odom ___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: [PD-dev] strings
On Dec 18, 2006, at 12:42 PM, Mathieu Bouchard wrote: On Mon, 18 Dec 2006, Hans-Christoph Steiner wrote: On Dec 17, 2006, at 1:36 AM, Mathieu Bouchard wrote: That's aiming low. Why shouldn't there be any automatic casts between the two? Automatic type conversion sounds like a really bad idea if the language only partially supports it. If that's the case then pd is a really bad idea. It's not possible to typecast any value of a type to any value of another type, all of the time. That's even true in the most typecast-frenzy languages ever. There's no such requirement that implicit casts have to be impossibly well supported in order to be a good idea. You're just dismissing all forms of implicit casts as being bad ideas. Pd is strongly typed, so what Martin says is definitely appropriate. Non-sequitur, there are languages that have quite strict and elaborate type checking and yet which support implicit casts. For example, C++. And then, in many cases, languages can become less strictly checked without any problems, as long as nothing actually relies on type violations. Because pd doesn't have any error handling (in the sense of pd patches being able to figure out their own problems), if any patch doesn't spit out any error messages, it would run the same if there were more implicit casts made, because the patches would never trigger those implicit casts anyway. C/C++ is not very strict. It allows you to just change what you call a chunk of memory without complaint. Try Pascal, that is strict. Or Pd's floats and symbols. There is no way to make a float into symbol or vice versa without trickery. For example, these doesn't work: [123([symbol 123( [symbol 123( | | | [symbol] [ \ -- (symbol box) [float] .hc Perl is the opposite, everything can be automatically cast, so there it makes sense. No, in Perl there's no way that you can cast something to a pointer. You can use the backslash operator to make a pointer to anything, but that's not a cast-to-pointer, because if you use it on a pointer, you don't get the same pointer, you get a new pointer. So... anyone want to code up some of these ideas? We could try them out in the next Pd-extended. Martin's solution, my solution, or your solution? _ _ __ ___ _ _ _ ... | Mathieu Bouchard - tél:+1.514.383.3801 - http://artengine.ca/matju | Freelance Digital Arts Engineer, Montréal QC Canada Terrorism is not an enemy. It cannot be defeated. It's a tactic. It's about as sensible to say we declare war on night attacks and expect we're going to win that war. We're not going to win the war on terrorism.- retired U.S. Army general, William Odom ___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: [PD-dev] strings
On Mon, 18 Dec 2006, Hans-Christoph Steiner wrote: On Dec 18, 2006, at 12:42 PM, Mathieu Bouchard wrote: Non-sequitur, there are languages that have quite strict and elaborate type checking and yet which support implicit casts. For example, C++. C/C++ is not very strict. It allows you to just change what you call a chunk of memory without complaint. Ok, I spoke too quick. I didn't want to say strict. I shouldn't have said strict. Instead of strict I wanted to say that the type checking happens all of the time. I thought up some kind of classification of type systems, avoiding to call them strong/weak or static/dynamic because those words are confusing. 1. Typed expressions: each piece of code that can give a value, has a type that can be figured out at compile-time. 2. Typed variables/parameters: declarations allow runtime checks but not compile-time checks. 3. Typed values: variables don't have types, they can contain any value, but every value has a type. 4. Typed uses: values don't have types, a type is a way of using a value. Strictness, in the sense of forbidding things to the user, is not on that scale, it's another aspect. A well-balanced strictness allows one to bypass the system whenever needed, but without being too error-prone. However it's difficult to say what it means to bypass the system for all four typing categories at once, or even within one category. _ _ __ ___ _ _ _ ... | Mathieu Bouchard - tél:+1.514.383.3801 - http://artengine.ca/matju | Freelance Digital Arts Engineer, Montréal QC Canada___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: [PD-dev] strings
On 2006-12-17 03:09:19, Martin Peach [EMAIL PROTECTED] appears to have written: A string could be considered unused when its length is set to 0. Memory would need to be dynamically allocated in small blocks. The API should return no method for string if the external doesn't implement strings. ... which wouldn't get us true strings in the mathematical sense of a free monoid Alphabet,concat(), since the empty string is the identity element for concat()... marmosets, Bryan -- Bryan Jurish There is *always* one more bug. [EMAIL PROTECTED] -Lubarsky's Law of Cybernetic Entomology ___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: [PD-dev] strings
On Sun, 17 Dec 2006, Bryan Jurish wrote: ... which wouldn't get us true strings in the mathematical sense of a free monoid Alphabet,concat(), since the empty string is the identity element for concat()... Right, and it may seem like not much, but if one is going to make a lot of abstractions for basic string processing, i'd rather have them use monoid algorithms rather than semigroup algorithms. The monoid algorithms are often nicer... semigroup algorithms can't start with an empty string, so they start with the first character of a string, and then do a foreach-loop that starts on the second character so that the first character isn't counted twice, so you have to decide a way to skip that character... ugly. _ _ __ ___ _ _ _ ... | Mathieu Bouchard - tél:+1.514.383.3801 - http://artengine.ca/matju | Freelance Digital Arts Engineer, Montréal QC Canada___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: [PD-dev] strings
Mathieu Bouchard wrote: On Sat, 16 Dec 2006, Martin Peach wrote: What if strings could be automatically cast to symbols for externals that would rather have symbols, and vice-versa? I have written an external asc2sym that takes lists of bytes and splits them into symbols based on the argument(s) which are characters. But it seems important to avoid symbols as much as possible to avoid filling up the symbol table with symbols that are referenced only once.. Yes, but my reason for wanting this, is that all externals currently available understand symbols but not strings. So, what if you want to make strings as widely used as possible, as easily as possible, and working with all externals currently available in Pd? You make them work as strings when they can, and You make them work as symbols when they must. There would be two objects, [stringtosymbol] and [symboltostring] that you could put between string and symbol objects. Of course some strings would get impossibly mangled this way but that's because of the way symbols work. A string could be considered unused when its length is set to 0. If you want to use a string as a mutable buffer, then you want to be able to have 0-length strings, as a boundary condition: you start with nothing and then add to it. You don't want to have to start with something just because setting the length to 0 would delete it. Yes, there's no reason not to have 0-length strings. And no reason to trash them when they are unused either, since they don't take up more space than any other object. It seems that you are suggesting that the deallocation would be user-controlled? Then how do you prevent the user from crashing pd? I'm suggesting that a [string] be like any other object and be deallocated when the patcher is closed. It's basically a variable-length list of bytes. It would contain methods to allocated and deallocate memory via malloc() or pd's getbytes(), which uses calloc(). If you use a weak-pointer as an intermediate (like t_gpointer or t_gfxstub), then you still have to manage reference counts. Whatever you do for the user, you have to know more about externals' behaviour than what they tell you now, because right now they don't deallocate atoms explicitly. But if strings are going to be deallocated explicitly and there is not going to be any checks, why not instead make something that will allow users to deallocate symbols. It's about as safe as that and you don't need to introduce a string type. Symbols are difficult to work with because their content gets interpreted, for example if I write a comment MP 20061214 it gets converted into MP 2.00612e+007, or if I want a symbol to have spaces or carriage returns in it, it won't get created, which is very annoying when a lot of serial hardware wants to see a CR before it processes a message. Also every time I change a symbol, it gets added to the global symbol table. So adding one character at a time to a string would result in that many symbols being created. A string as I see it is closer to a list, and could be operated on with objects like the list objects -- append, split, etc. Memory would need to be dynamically allocated in small blocks. What do you mean in small blocks ? Whatever is most efficient. If malloc is better at allocating blocks of 256 bytes than blocks of 1 then it's better to work with multiples of 256. It seems inefficient to allocate 65536 bytes for every string at creation time. The API should return no method for string if the external doesn't implement strings. That's aiming low. Why shouldn't there be any automatic casts between the two? Because it would require rewriting more of the pd core, and because a lot of strings can't be made into symbols (strings can contain any integer on [0...255] but symbols cannot). Having the two converter objects [stringtosymbol] and [symboltostring] is easier. The no method for string message would come from pd, not the external, so the external doesn't need to implement any string methods. Martin ___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: [PD-dev] strings
Mathieu Bouchard wrote: On Sat, 16 Dec 2006, Martin Peach wrote: Yes, and it's also easier to limit strings to word (16-bit) lengths, while 8-bit is too short. So a t_string would look like: typedef struct _string /* pointer to a string */ { unsigned short s_length; /* length of string in bytes */ unsigned char *s_data; /* pointer to 1st byte of string */ } t_string; If you're not compiling in 16-bit mode, then there will be 2 or 6 bytes between the first and second field, so that the second field can be aligned to a word boundary, supposing that the struct as a whole is itself aligned to a word boundary. (By word, I strictly mean something that is the same size as a pointer.) What I mean is that it's useless to not use the whole a length field that is not the same size as the pointer field, if you have only those two fields. If you have more than two fields, then you can put several short fields in the space of a word (2 or 4). I suppose we could do like Apple or Microsoft and have something like: typedef struct _string /* pointer to a string */ { unsigned short s_length; /* length of string in bytes */ unsigned short s_reserved; /* filler */ unsigned char *s_data; /* pointer to 1st byte of string */ } t_string; but in the long term it would be best to just use long lengths for when we all have teraflop laptops: typedef struct _string /* pointer to a string */ { unsigned long s_length; /* length of string in bytes */ unsigned char *s_data; /* pointer to 1st byte of string */ } t_string; ...but restrict the maximum string length using a #define MAX_STRING_LENGTH so that pd doesn't bite off more than it can chew... Martin ___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: [PD-dev] strings
moin Martin, moin list, On 2006-12-17 21:46:50, Martin Peach [EMAIL PROTECTED] appears to have written: Bryan Jurish wrote: On 2006-12-17 03:09:19, Martin Peach [EMAIL PROTECTED] appears to have written: A string could be considered unused when its length is set to 0. Memory would need to be dynamically allocated in small blocks. The API should return no method for string if the external doesn't implement strings. ... which wouldn't get us true strings in the mathematical sense of a free monoid Alphabet,concat(), since the empty string is the identity element for concat()... Yes, I agree there should be no restriction on empty strings. I also think there is no need to destroy strings except when the patcher is closed, so it's not really an issue. if by destroy you mean de-allocation of the string struct itself (i assume you do; your suggestion looks a lot like a glib GString btw, which is im(ns)ho a good general purpose c string struct), and if a string therefore winds up being just something like a symbol with a volatile value (i.e. doesn't get written to the symbol table), then i agree. what i think we need to avoid with strings (i don't think anyone has suggested otherwise, i'm just stating the obvious) is symbol-style permanent allocation for every string *value*. string variables could/should be handled like any other pd atom: the external which creates them is responsible for (de-)allocation, which would wind up doing what you suggest and freeing any allocated memory when the responsible object is destroyed (provided the object doesn't leak memory, but i think we can assume c programmers are used to keeping track of such things -- ymmv). in fact, this is how [any2string] handles things, in its ugly list-of-floats way... marmosets, Bryan -- Bryan Jurish There is *always* one more bug. [EMAIL PROTECTED] -Lubarsky's Law of Cybernetic Entomology ___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: [PD-dev] strings
On Dec 17, 2006, at 1:36 AM, Mathieu Bouchard wrote: On Sat, 16 Dec 2006, Martin Peach wrote: What if strings could be automatically cast to symbols for externals that would rather have symbols, and vice-versa? I have written an external asc2sym that takes lists of bytes and splits them into symbols based on the argument(s) which are characters. But it seems important to avoid symbols as much as possible to avoid filling up the symbol table with symbols that are referenced only once.. Yes, but my reason for wanting this, is that all externals currently available understand symbols but not strings. So, what if you want to make strings as widely used as possible, as easily as possible, and working with all externals currently available in Pd? You make them work as strings when they can, and You make them work as symbols when they must. A string could be considered unused when its length is set to 0. If you want to use a string as a mutable buffer, then you want to be able to have 0-length strings, as a boundary condition: you start with nothing and then add to it. You don't want to have to start with something just because setting the length to 0 would delete it. It seems that you are suggesting that the deallocation would be user-controlled? Then how do you prevent the user from crashing pd? If you use a weak-pointer as an intermediate (like t_gpointer or t_gfxstub), then you still have to manage reference counts. Whatever you do for the user, you have to know more about externals' behaviour than what they tell you now, because right now they don't deallocate atoms explicitly. But if strings are going to be deallocated explicitly and there is not going to be any checks, why not instead make something that will allow users to deallocate symbols. It's about as safe as that and you don't need to introduce a string type. Memory would need to be dynamically allocated in small blocks. What do you mean in small blocks ? The API should return no method for string if the external doesn't implement strings. That's aiming low. Why shouldn't there be any automatic casts between the two? Automatic type conversion sounds like a really bad idea if the language only partially supports it. Pd is strongly typed, so what Martin says is definitely appropriate. Perl is the opposite, everything can be automatically cast, so there it makes sense. So... anyone want to code up some of these ideas? We could try them out in the next Pd-extended. .hc There is no way to peace, peace is the way. -A.J. Muste ___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: [PD-dev] strings
Automatic type conversion sounds like a really bad idea if the language only partially supports it. Pd is strongly typed is it? it mainly has numbers that occasionally look like symbols, and symbols that more than occasionally look like lists and/or strings.. , so what Martin says is definitely appropriate. Perl is the opposite, everything can be automatically cast, so there it makes sense. it is definitely a design decision which way to go. could PD flexibly support both at once? or does there need to be an OCaml edition, and a Perl edition? ___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: [PD-dev] strings
Automatic type conversion sounds like a really bad idea if the language only partially supports it. Pd is strongly typed do you think the target user base wants to think in terms of casting types? i don't. i have a feeling that was why there are so few types. i think most users wan't to be able to plug anything into anythign and at least get some sort of result, more than expected bang, got '' scrolling 1000 times a second in stderr... my vote would be a nice selection of types, and autocasting (maybe warnings at most for int vs float, string to symbol, etc) of course the only real way to vote for this would be write the code - i think i'll wait for PNPD instead.. :) ___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: [PD-dev] strings
On Sat, 16 Dec 2006, Bryan Jurish wrote: On 2006-12-16 01:40:03, Mathieu Bouchard [EMAIL PROTECTED] appears to have written: i count (sizeof(int)+sizeof(float)-1)*strlen(message) wasted bytes per string object, not counting the selector. Oh yeah, sorry, the occupied space is up to 4 times as text but it's 8 times in 32-bit mode and 16 times in 64-bit mode. as i think we've discussed before, using ieee floats, which should be able to losslessly encode a 24 bit integer, if you want something saveable to a file, pd can only losslessly convert 19.93 bits to decimal. ... but then again, what else are ascii 0x1c-0x1f (28-31 = {fs,gs,rs,us}) for? When I was a small kid, my parents bought a CGP-115 plotter, and the code for changing the colour of the stylus was 29. It was in 1983. it's another ugly hack, would reserve some of the ascii range, 0 is enough to do lists-of-strings because in many ASCII-based systems it's only ever used to mean end-of-string. It's faster than my nested-list hack. However, my hack looks more like what the syntax for nested lists could become if it were not a hack. Essentially my hack is a post-parser that reinterprets symbol-atoms depending on their parens-content, and makes it feel like pd has a LISP syntax... sometimes. (It's a GridFlow-only feature though). _ _ __ ___ _ _ _ ... | Mathieu Bouchard - tél:+1.514.383.3801 - http://artengine.ca/matju | Freelance Digital Arts Engineer, Montréal QC Canada___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: [PD-dev] strings
On Dec 16, 2006, at 4:55 AM, Bryan Jurish wrote: morning, On 2006-12-16 01:40:03, Mathieu Bouchard [EMAIL PROTECTED] appears to have written: On Fri, 15 Dec 2006, Hans-Christoph Steiner wrote: An advantage using the list-of-bytes approach is that because each character can be represented by a rather large integer, it can be extended to work on lists-of-characters meaning quickly, if there is a [utf8decode] and [utf8encode] to turn bytes into characters and back; also it's a method that is available now and reuses the existing list objects; and it's a method that supports \0 (NUL) characters. Disadvantages are that it takes more time to convert to C strings and back, it takes more space in .pd files, it isn't readable as text in .pd files, it takes up to 4 times more space to represent in .pd files, and exactly 4 times more space in RAM (in the case that just iso-latin-1 is used), and also that you can't make lists of strings like that. i count (sizeof(int)+sizeof(float)-1)*strlen(message) wasted bytes per string object, not counting the selector. as i think we've discussed before, using ieee floats, which should be able to losslessly encode a 24 bit integer, that can be tweaked down to (sizeof(int)+sizeof(float)-1)*strlen(message)/3 on average, but on my system (32 bit floats), that still amounts to one wasted byte per character for the representation, and it's hellishly cryptic to boot. (By the time we can have real strings, we can have nested-lists, and the other way around, because they'd use the same mechanisms. whether it's better to make them two types or one type, is a good question.) ... but then again, what else are ascii 0x1c-0x1f (28-31 = {fs,gs,rs,us}) for? it's another ugly hack, would reserve some of the ascii range, and would require additional parsing objects (potentially constructable with [list]), but it's a possibility, should anyone actually need nested lists as strings... please don't get me wrong: i'm all in favor of real strings, nested lists, and associative arrays - i wrote [pdstring] because i needed to send some generated text over OSC to someone who could only interpret ascii values: i'm glad if it's helpful to anyone besides myself, and i don't see much difficulty in adding support for low-level c-type string operations ([toupper], [tolower], at some later point maybe even regexes), but i can't bring myself to believe that the list-of-bytes approach is really the right way to do it, although i don't have a better idea at the moment... One advantage of this approach is that many C string functions like toupper, tolower, strcat, strcmp, etc. would be pretty easy to implement in Pd, rather than C. A regexp object in C would be pretty straightforward. How about using a selector string for these lists? I suppose that could cause mayhem since it would make the list into a selector series and run into all the vagaries of handling them. .hc Man has survived hitherto because he was too ignorant to know how to realize his wishes. Now that he can realize them, he must either change them, or perish.-William Carlos Williams ___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: [PD-dev] strings
I think it would be most efficient to have a string type be a length followed by that many unsigned chars, similar to a Pascal string but with the length being something like a 32-bit integer. It would not be added to pd's symbol list. The atom whose type was string would have to contain a pointer to the first byte of the string, and a length. Multibyte characters would just be counted as multiple characters when calculating the length, so the length would be the number of bytes in the string, not the number of characters. It looks too easy to me...In m_pd.h, add: A_STRING to t_atomtype. Add t_string * w_string; to t_word. Add the typedef: typedef struct _string /* pointer to a string */ { unsigned long s_length; /* length of string in bytes */ unsigned char *s_data; /* pointer to 1st byte of string */ } t_string; ...so a string atom would have a_type = A_STRING and a_w = a_w.w_string, which points to a t_string containing the length and a pointer to the string. If pd is otherwise able to handle atom types it doesn't know about (?), all the string manipulation objects could be built as externals. Martin ___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: [PD-dev] strings
On Sat, 16 Dec 2006, Martin Peach wrote: ...so a string atom would have a_type = A_STRING and a_w = a_w.w_string, which points to a t_string containing the length and a pointer to the string. If pd is otherwise able to handle atom types it doesn't know about (?), It's not. There are no provisions for adding any extra atom types. There's no table for registering atom types. Out of 12 assigned numbers for atom types, 5 aren't actually atom types, 4 are radioactive types (SEMI,COMMA,DOLLAR,DOLLSYM), the remaining three have reserved selectors and hardcoded entries in t_class. What's the right way to add a fourth one like that? all the string manipulation objects could be built as externals. What if strings could be automatically cast to symbols for externals that would rather have symbols, and vice-versa? It looks too easy to me... It's because you've only thought about the easy part of the problem. How do you know when a string becomes unused? When do you deallocate the memory? What does this mean for the API used by externals? (including the things that are assumed but not written in m_pd.h) _ _ __ ___ _ _ _ ... | Mathieu Bouchard - tél:+1.514.383.3801 - http://artengine.ca/matju | Freelance Digital Arts Engineer, Montréal QC Canada___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: [PD-dev] strings
Hans-Christoph Steiner wrote: The one problem I can think of here is that you can only have 19 bits of precision in Pd's 32-bit t_float. So having a length of 32 bits would cause problems if trying to deal with string length using t_floats. I could see this happening in a loop in Pd space, for example. Yes, and it's also easier to limit strings to word (16-bit) lengths, while 8-bit is too short. So a t_string would look like: typedef struct _string /* pointer to a string */ { unsigned short s_length; /* length of string in bytes */ unsigned char *s_data; /* pointer to 1st byte of string */ } t_string; Martin .hc On Dec 16, 2006, at 5:12 PM, Martin Peach wrote: I think it would be most efficient to have a string type be a length followed by that many unsigned chars, similar to a Pascal string but with the length being something like a 32-bit integer. It would not be added to pd's symbol list. The atom whose type was string would have to contain a pointer to the first byte of the string, and a length. Multibyte characters would just be counted as multiple characters when calculating the length, so the length would be the number of bytes in the string, not the number of characters. It looks too easy to me...In m_pd.h, add: A_STRING to t_atomtype. Add t_string * w_string; to t_word. Add the typedef: typedef struct _string /* pointer to a string */ { unsigned long s_length; /* length of string in bytes */ unsigned char *s_data; /* pointer to 1st byte of string */ } t_string; ...so a string atom would have a_type = A_STRING and a_w = a_w.w_string, which points to a t_string containing the length and a pointer to the string. If pd is otherwise able to handle atom types it doesn't know about (?), all the string manipulation objects could be built as externals. Martin ___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev News is what people want to keep hidden and everything else is publicity. - Bill Moyers ___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: [PD-dev] strings
On Sat, 16 Dec 2006, Martin Peach wrote: Yes, and it's also easier to limit strings to word (16-bit) lengths, while 8-bit is too short. So a t_string would look like: typedef struct _string /* pointer to a string */ { unsigned short s_length; /* length of string in bytes */ unsigned char *s_data; /* pointer to 1st byte of string */ } t_string; If you're not compiling in 16-bit mode, then there will be 2 or 6 bytes between the first and second field, so that the second field can be aligned to a word boundary, supposing that the struct as a whole is itself aligned to a word boundary. (By word, I strictly mean something that is the same size as a pointer.) What I mean is that it's useless to not use the whole a length field that is not the same size as the pointer field, if you have only those two fields. If you have more than two fields, then you can put several short fields in the space of a word (2 or 4). _ _ __ ___ _ _ _ ... | Mathieu Bouchard - tél:+1.514.383.3801 - http://artengine.ca/matju | Freelance Digital Arts Engineer, Montréal QC Canada___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: [PD-dev] strings
On Sat, 16 Dec 2006, Martin Peach wrote: What if strings could be automatically cast to symbols for externals that would rather have symbols, and vice-versa? I have written an external asc2sym that takes lists of bytes and splits them into symbols based on the argument(s) which are characters. But it seems important to avoid symbols as much as possible to avoid filling up the symbol table with symbols that are referenced only once.. Yes, but my reason for wanting this, is that all externals currently available understand symbols but not strings. So, what if you want to make strings as widely used as possible, as easily as possible, and working with all externals currently available in Pd? You make them work as strings when they can, and You make them work as symbols when they must. A string could be considered unused when its length is set to 0. If you want to use a string as a mutable buffer, then you want to be able to have 0-length strings, as a boundary condition: you start with nothing and then add to it. You don't want to have to start with something just because setting the length to 0 would delete it. It seems that you are suggesting that the deallocation would be user-controlled? Then how do you prevent the user from crashing pd? If you use a weak-pointer as an intermediate (like t_gpointer or t_gfxstub), then you still have to manage reference counts. Whatever you do for the user, you have to know more about externals' behaviour than what they tell you now, because right now they don't deallocate atoms explicitly. But if strings are going to be deallocated explicitly and there is not going to be any checks, why not instead make something that will allow users to deallocate symbols. It's about as safe as that and you don't need to introduce a string type. Memory would need to be dynamically allocated in small blocks. What do you mean in small blocks ? The API should return no method for string if the external doesn't implement strings. That's aiming low. Why shouldn't there be any automatic casts between the two? _ _ __ ___ _ _ _ ... | Mathieu Bouchard - tél:+1.514.383.3801 - http://artengine.ca/matju | Freelance Digital Arts Engineer, Montréal QC Canada___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: [PD-dev] strings
Thanks Hans and IOhan, I think Bryans offering covers most of what is needed, adequate to muddle by until such time when we have real strings. Andy On Fri, 15 Dec 2006 17:41:03 -0500 Hans-Christoph Steiner [EMAIL PROTECTED] wrote: You can do a fair amount of string handling with [list2symbol] and things like that. But yes, it leaves a lot to be desired. Bryan Jurish has taken a different approach, which is to use lists of bytes to represent strings. Might be worth checking out. .hc On Dec 15, 2006, at 2:06 AM, padawan12 wrote: A new and keen developer on the forums has asked - What about text processing in Pd? to which I replied Pd doesn't do strings. I tie myself in knots trying string-like operations sometimes :), so I know its a can of worms, but what are the fundamental limitations surrounding symbol. How do we deal with EOL or NULL and so on, and what about encoding? Did I hear a rumour that better string handling is chalked in for Pd soon? An alphanumeric sort, maybe even a [grep] or [sed]? What would be the best way to introduce the concept of strings to Pd in a consistent and robust way. I see them as lists of symbols without any need for a new type but right now there are pieces of the jigsaw missing. Sorry so many questions, but it's bugging me today. a. ___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev All information should be free. - the hacker ethic ___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: [PD-dev] strings
Plus you can use that string format directly with Martin Peach's network objects, AFAIK. .hc On Dec 16, 2006, at 7:16 AM, padawan12 wrote: Thanks Hans and IOhan, I think Bryans offering covers most of what is needed, adequate to muddle by until such time when we have real strings. Andy On Fri, 15 Dec 2006 17:41:03 -0500 Hans-Christoph Steiner [EMAIL PROTECTED] wrote: You can do a fair amount of string handling with [list2symbol] and things like that. But yes, it leaves a lot to be desired. Bryan Jurish has taken a different approach, which is to use lists of bytes to represent strings. Might be worth checking out. .hc On Dec 15, 2006, at 2:06 AM, padawan12 wrote: A new and keen developer on the forums has asked - What about text processing in Pd? to which I replied Pd doesn't do strings. I tie myself in knots trying string-like operations sometimes :), so I know its a can of worms, but what are the fundamental limitations surrounding symbol. How do we deal with EOL or NULL and so on, and what about encoding? Did I hear a rumour that better string handling is chalked in for Pd soon? An alphanumeric sort, maybe even a [grep] or [sed]? What would be the best way to introduce the concept of strings to Pd in a consistent and robust way. I see them as lists of symbols without any need for a new type but right now there are pieces of the jigsaw missing. Sorry so many questions, but it's bugging me today. a. ___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev - --- All information should be free. - the hacker ethic All information should be free. - the hacker ethic ___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev
Re: [PD-dev] strings
On Fri, 15 Dec 2006, Hans-Christoph Steiner wrote: But yes, it leaves a lot to be desired. Bryan Jurish has taken a different approach, which is to use lists of bytes to represent strings. Might be worth checking out. An advantage using the list-of-bytes approach is that because each character can be represented by a rather large integer, it can be extended to work on lists-of-characters meaning quickly, if there is a [utf8decode] and [utf8encode] to turn bytes into characters and back; also it's a method that is available now and reuses the existing list objects; and it's a method that supports \0 (NUL) characters. Disadvantages are that it takes more time to convert to C strings and back, it takes more space in .pd files, it isn't readable as text in .pd files, it takes up to 4 times more space to represent in .pd files, and exactly 4 times more space in RAM (in the case that just iso-latin-1 is used), and also that you can't make lists of strings like that. (By the time we can have real strings, we can have nested-lists, and the other way around, because they'd use the same mechanisms. whether it's better to make them two types or one type, is a good question.) _ _ __ ___ _ _ _ ... | Mathieu Bouchard - tél:+1.514.383.3801 - http://artengine.ca/matju | Freelance Digital Arts Engineer, Montréal QC Canada___ PD-dev mailing list PD-dev@iem.at http://lists.puredata.info/listinfo/pd-dev