Re: [PD] http, html and textfiles
hello, If we WERE going to use just pd for longer text type processing, what optimization methods would be recommended? Is there a particular font that PD handles better than others? Would it be wise to strip text of backslashes, spaces, commas as im assuming from your post? Can PD grab a random 'line' from a text file so as not to have to load the whole thing? Thanks mark --- Frank Barknecht [EMAIL PROTECTED] wrote: Hallo, wolfgang schwarzenbrunner hat gesagt: // wolfgang schwarzenbrunner wrote: i am working on a little project in which websites are going to be parsed. well. i thought this might be a nice thing using the regex object from zexy... the only problem i am facing right now is that i have no idea how i could get a html file on my harddisk using pd (something like a http browsing object)... any suggestions? Yep: Don't use Pd for text processing. Pd is good at many things, but it's not good at parsing and modifying larger amounts of text. AFAIK there still is no garbage collection for unused symbols (Pd's strings), it's overcomplicated to deal with certain characters (backslashes, spaces, commas, ...) when they should not be interpreted by Pd etc. What I would recommend is to do your text processing in a different language. Many (scripting) languages that are great with text can be used inside of Pd: Lua, Python, Java, Scheme, etc. Most of these also include or can be extended easily with nice web browsing tools (CURL, Socket, system(wget) ...). In the end you can do both the browsing and all processing in one place and then only need to feed the results over to Pd in a format, Pd can handle with more elegance than it can handle large amounts of text. Of course it depends a bit on how complex your project is, so you may get away with pure Pd as well, but IMO it's a better use of Pd to externalize the text processing to a language better suited. Ciao -- Frank Barknecht ___ PD-list@iem.at mailing list UNSUBSCRIBE and account-management - http://lists.puredata.info/listinfo/pd-list mark edward grimm | m.f.a | ed.m syracuse u. | vpa foundations | timearts adjunct | new media consultant megrimm.net | socialmediagroup.org .com [EMAIL PROTECTED] | 315.378.2136 ___ PD-list@iem.at mailing list UNSUBSCRIBE and account-management - http://lists.puredata.info/listinfo/pd-list
Re: [PD] http, html and textfiles
If you have pd extended you can use the mrpeach/tcpclient to retrieve the page and the mrpeach/str object to snip the data on any character or position. You have to enter things like spaces and backslashes as their decimal equivalent though. An important pair is 10 13 for CR LF. It works better if you are accessing simple web pages, especially if you are printing out the characters as they arrive. Neither http or html know about fonts, it's all pure ascii text. Martin From: mark edward grimm [EMAIL PROTECTED] Reply-To: [EMAIL PROTECTED] To: Frank Barknecht [EMAIL PROTECTED], pd-list@iem.at Subject: Re: [PD] http, html and textfiles Date: Fri, 2 May 2008 06:34:18 -0700 (PDT) hello, If we WERE going to use just pd for longer text type processing, what optimization methods would be recommended? Is there a particular font that PD handles better than others? Would it be wise to strip text of backslashes, spaces, commas as im assuming from your post? Can PD grab a random 'line' from a text file so as not to have to load the whole thing? Thanks mark --- Frank Barknecht [EMAIL PROTECTED] wrote: Hallo, wolfgang schwarzenbrunner hat gesagt: // wolfgang schwarzenbrunner wrote: i am working on a little project in which websites are going to be parsed. well. i thought this might be a nice thing using the regex object from zexy... the only problem i am facing right now is that i have no idea how i could get a html file on my harddisk using pd (something like a http browsing object)... any suggestions? Yep: Don't use Pd for text processing. Pd is good at many things, but it's not good at parsing and modifying larger amounts of text. AFAIK there still is no garbage collection for unused symbols (Pd's strings), it's overcomplicated to deal with certain characters (backslashes, spaces, commas, ...) when they should not be interpreted by Pd etc. What I would recommend is to do your text processing in a different language. Many (scripting) languages that are great with text can be used inside of Pd: Lua, Python, Java, Scheme, etc. Most of these also include or can be extended easily with nice web browsing tools (CURL, Socket, system(wget) ...). In the end you can do both the browsing and all processing in one place and then only need to feed the results over to Pd in a format, Pd can handle with more elegance than it can handle large amounts of text. Of course it depends a bit on how complex your project is, so you may get away with pure Pd as well, but IMO it's a better use of Pd to externalize the text processing to a language better suited. Ciao -- Frank Barknecht ___ PD-list@iem.at mailing list UNSUBSCRIBE and account-management - http://lists.puredata.info/listinfo/pd-list mark edward grimm | m.f.a | ed.m syracuse u. | vpa foundations | timearts adjunct | new media consultant megrimm.net | socialmediagroup.org .com [EMAIL PROTECTED] | 315.378.2136 ___ PD-list@iem.at mailing list UNSUBSCRIBE and account-management - http://lists.puredata.info/listinfo/pd-list ___ PD-list@iem.at mailing list UNSUBSCRIBE and account-management - http://lists.puredata.info/listinfo/pd-list
Re: [PD] http, html and textfiles
hey thanks on the tip, i will try that out... html know about fonts, it's all pure ascii text. oh yes i know this. i meant an optimized font for when the text is in pd/gem window... i guess it wouldn't really matter though as long as the loaded font wasn't too complicated. Thanks! mark --- Martin Peach [EMAIL PROTECTED] wrote: If you have pd extended you can use the mrpeach/tcpclient to retrieve the page and the mrpeach/str object to snip the data on any character or position. You have to enter things like spaces and backslashes as their decimal equivalent though. An important pair is 10 13 for CR LF. It works better if you are accessing simple web pages, especially if you are printing out the characters as they arrive. Neither http or html know about fonts, it's all pure ascii text. Martin From: mark edward grimm [EMAIL PROTECTED] Reply-To: [EMAIL PROTECTED] To: Frank Barknecht [EMAIL PROTECTED], pd-list@iem.at Subject: Re: [PD] http, html and textfiles Date: Fri, 2 May 2008 06:34:18 -0700 (PDT) hello, If we WERE going to use just pd for longer text type processing, what optimization methods would be recommended? Is there a particular font that PD handles better than others? Would it be wise to strip text of backslashes, spaces, commas as im assuming from your post? Can PD grab a random 'line' from a text file so as not to have to load the whole thing? Thanks mark --- Frank Barknecht [EMAIL PROTECTED] wrote: Hallo, wolfgang schwarzenbrunner hat gesagt: // wolfgang schwarzenbrunner wrote: i am working on a little project in which websites are going to be parsed. well. i thought this might be a nice thing using the regex object from zexy... the only problem i am facing right now is that i have no idea how i could get a html file on my harddisk using pd (something like a http browsing object)... any suggestions? Yep: Don't use Pd for text processing. Pd is good at many things, but it's not good at parsing and modifying larger amounts of text. AFAIK there still is no garbage collection for unused symbols (Pd's strings), it's overcomplicated to deal with certain characters (backslashes, spaces, commas, ...) when they should not be interpreted by Pd etc. What I would recommend is to do your text processing in a different language. Many (scripting) languages that are great with text can be used inside of Pd: Lua, Python, Java, Scheme, etc. Most of these also include or can be extended easily with nice web browsing tools (CURL, Socket, system(wget) ...). In the end you can do both the browsing and all processing in one place and then only need to feed the results over to Pd in a format, Pd can handle with more elegance than it can handle large amounts of text. Of course it depends a bit on how complex your project is, so you may get away with pure Pd as well, but IMO it's a better use of Pd to externalize the text processing to a language better suited. Ciao -- Frank Barknecht ___ PD-list@iem.at mailing list UNSUBSCRIBE and account-management - http://lists.puredata.info/listinfo/pd-list mark edward grimm | m.f.a | ed.m syracuse u. | vpa foundations | timearts adjunct | new media consultant megrimm.net | socialmediagroup.org .com [EMAIL PROTECTED] | 315.378.2136 ___ PD-list@iem.at mailing list UNSUBSCRIBE and account-management - http://lists.puredata.info/listinfo/pd-list mark edward grimm | m.f.a | ed.m syracuse u. | vpa foundations | timearts adjunct | new media consultant megrimm.net | socialmediagroup.org .com [EMAIL PROTECTED] | 315.378.2136 ___ PD-list@iem.at mailing list UNSUBSCRIBE and account-management - http://lists.puredata.info/listinfo/pd-list
[PD] http, html and textfiles
hello, i am working on a little project in which websites are going to be parsed. well. i thought this might be a nice thing using the regex object from zexy... the only problem i am facing right now is that i have no idea how i could get a html file on my harddisk using pd (something like a http browsing object)... any suggestions? best wolfgang ___ PD-list@iem.at mailing list UNSUBSCRIBE and account-management - http://lists.puredata.info/listinfo/pd-list
Re: [PD] http, html and textfiles
i'm having a déja-vu. check the thread in the archive: Subject: Retrieving Text form a URL | Webpage Am 29.04.2008 um 01:33 schrieb wolfgang schwarzenbrunner: hello, i am working on a little project in which websites are going to be parsed. well. i thought this might be a nice thing using the regex object from zexy... the only problem i am facing right now is that i have no idea how i could get a html file on my harddisk using pd (something like a http browsing object)... any suggestions? best wolfgang ___ PD-list@iem.at mailing list UNSUBSCRIBE and account-management - http://lists.puredata.info/ listinfo/pd-list ___ PD-list@iem.at mailing list UNSUBSCRIBE and account-management - http://lists.puredata.info/listinfo/pd-list
Re: [PD] http, html and textfiles
Hallo, wolfgang schwarzenbrunner hat gesagt: // wolfgang schwarzenbrunner wrote: i am working on a little project in which websites are going to be parsed. well. i thought this might be a nice thing using the regex object from zexy... the only problem i am facing right now is that i have no idea how i could get a html file on my harddisk using pd (something like a http browsing object)... any suggestions? Yep: Don't use Pd for text processing. Pd is good at many things, but it's not good at parsing and modifying larger amounts of text. AFAIK there still is no garbage collection for unused symbols (Pd's strings), it's overcomplicated to deal with certain characters (backslashes, spaces, commas, ...) when they should not be interpreted by Pd etc. What I would recommend is to do your text processing in a different language. Many (scripting) languages that are great with text can be used inside of Pd: Lua, Python, Java, Scheme, etc. Most of these also include or can be extended easily with nice web browsing tools (CURL, Socket, system(wget) ...). In the end you can do both the browsing and all processing in one place and then only need to feed the results over to Pd in a format, Pd can handle with more elegance than it can handle large amounts of text. Of course it depends a bit on how complex your project is, so you may get away with pure Pd as well, but IMO it's a better use of Pd to externalize the text processing to a language better suited. Ciao -- Frank Barknecht ___ PD-list@iem.at mailing list UNSUBSCRIBE and account-management - http://lists.puredata.info/listinfo/pd-list