Re: [PHP] optimizing space for array of booleans
> but you're trying to pass stuff to it: > >public function tostring() { > $str = $this->binstr($this->bits[0]); > for ($i=1;$i<8;$i++) >$str .= "," . $this->binstr($this->bits[$i]); > return $str; >} Slap (on my face)! My stupidity... I come from a strongly typed language background and dynamically typed language often makes me less careful. Thanks, good job =). -- View this message in context: http://www.nabble.com/optimizing-space-for-array-of-booleans-tp22159131p22198832.html Sent from the PHP - General mailing list archive at Nabble.com. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] optimizing space for array of booleans
leledumbo wrote: Good points, I'll try it. Without testing it (it's late here), your binstr() function doesn't accept parameters, so it would always return the same result each time it's called, regardless of what you pass into it. In case you want to check it tomorrow or later: private function binstr() { $temp_bits = $this->bits; $str = ""; for ($i=0;$i<32;$i++) { $str = strval($temp_bits & 1) . $str; $temp_bits >>= 1; } return $str; } it doesn't accept parameters, but instead use private field $bits assigned to $temp_bits (PHP manual states that it will be copied instead of referenced, and it's exactly what I need). but you're trying to pass stuff to it: public function tostring() { $str = $this->binstr($this->bits[0]); for ($i=1;$i<8;$i++) $str .= "," . $this->binstr($this->bits[$i]); return $str; } -- Postgresql & php tutorials http://www.designmagick.com/ -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] optimizing space for array of booleans
Good points, I'll try it. > Without testing it (it's late here), your binstr() function doesn't > accept parameters, so it would always return the same result each time > it's called, regardless of what you pass into it. In case you want to check it tomorrow or later: private function binstr() { $temp_bits = $this->bits; $str = ""; for ($i=0;$i<32;$i++) { $str = strval($temp_bits & 1) . $str; $temp_bits >>= 1; } return $str; } it doesn't accept parameters, but instead use private field $bits assigned to $temp_bits (PHP manual states that it will be copied instead of referenced, and it's exactly what I need). -- View this message in context: http://www.nabble.com/optimizing-space-for-array-of-booleans-tp22159131p22196899.html Sent from the PHP - General mailing list archive at Nabble.com. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] optimizing space for array of booleans
On Wed, Feb 25, 2009 at 12:12 AM, leledumbo wrote: > >> Generally relationships like the one you describe are stored in three >> separate and related tables: Students, Courses, and Enrollment. The >> latter is a n:m association between the first two. The advantage this >> approach has with regard to storage is that it is a sparse matrix. > > I've done that and I'm playing around with the last one, this is where the > problem arise. It's a table with two columns, user id & the 'set' that > stores the enroll information. Since SQL doesn't have 'nice set data type' > (MySQL has it but it's restricted to 64 entries, and it's difficult to use), > I need to cover the problem from PHP side. That's because AFAIK MySQL implements the internal storage for a SET or ENUM column as a 4-byte unsigned value. A lot of people like MySQL because it offers those two data types, but this is the limitation. If you want to add more options to the list, you have to figure out a way to extend the functionality manually. >> Most students will only enroll in a handful of courses. With this >> approach, you only store a row in the Enrollment table if a student is >> enrolled in a course. Otherwise, you don't need to store anything. > > Do you mean that, for example, if student A enrolls course x,y,z the table > would be: > > user_id | course > ---+--- > A | x > A | y > A | z Exactly. > Students often forget (or are lazy) to unenroll after they've finished > studying. Wouldn't the database be bloated? That would be up to you and how much you need to keep in the table. You can add other columns as needed to help maintain the table. For instance, you could put start and end dates in the relation table. Then you could either leave the rows in there and maintain a history of when students took specific courses, or use those dates to delete associations that are older than you need to store in your application. And believe me - I'd rather deal with the bloat than with the overhead I expect you'll see with this custom bitmask solution you are trying to build. > And what would be the primary key? The primary key is usually a two-column key. (In some cases such as when you allow a user to retake the same course during multiple date periods, you might have to toss some additional columns into the primary key.) In the example above, the primary key would be on (user_id, course). The benefit you get, other than speeding up joins to this table, is that it prevents user_id A from being enrolled in course x more than once. I also usually add another index on just course. This uses a little more storage space, but I can query the table efficiently from either side of the relationship. >> And no, 1.2MB is not that big of a deal. A single floppy high-density >> drive could hold that much data 18 years ago! > > Yeah, but now I'm also considering the speed. So, I decided to build a class > that implements bitset operations and I'm gonna store the contents in > database as string. Here's the code: Try what you will, I doubt you'll beat a well-designed relational database for speed when it comes to problems like this. For example, how do you plan to optimize a query to find all students whose 23rd bit is 1 or whose 2nd bit is 0? Even with an integer column, that type of query doesn't use an index because it can't be answered by finding a value within a range of values. Instead, it has to perform a bitwise operation on every value in the column to see if the row matches. That's a table scan, and they are not very efficient in a database. They are even less efficient if you have to fetch the entire table into PHP and perform a manual table scan there. And now that you're looking at using strings to overcome the limitation of 64 values in a SET column, the problem gets even more expensive to solve. > class bitset { > private $bits; > > private function index($bit) { return $bit / 32; } > private function offset($bit) { return $bit % 32; } > > private function binstr() { > $temp_bits = $this->bits; > $str = ""; > for ($i=0;$i<32;$i++) { > $str = strval($temp_bits & 1) . $str; > $temp_bits >>= 1; > } > return $str; > } > > public function __construct() { > $this->bits = array(0,0,0,0,0,0,0,0); > } > > public function tostring() { > $str = $this->binstr($this->bits[0]); > for ($i=1;$i<8;$i++) > $str .= "," . $this->binstr($this->bits[$i]); > return $str; > } > > public function set($bit) { > $index = $this->index($bit); > $offset = $this->offset($bit); > $this->bits[$index] |= 1 << $offset; > } > > public function clear($bit) { > $index = $this->index($bit); > $offset = $this->offset($bit); > $this->bits[$index] &= ~(1 << $offset); > } > > public function test($bit) { > $index = $this->index($bit); > $offset = $this->offset($bit); > return $this->bits[$index] & (1 << $offset); > } > } > > little problem: tostring() method alw
Re: [PHP] optimizing space for array of booleans
> Generally relationships like the one you describe are stored in three > separate and related tables: Students, Courses, and Enrollment. The > latter is a n:m association between the first two. The advantage this > approach has with regard to storage is that it is a sparse matrix. I've done that and I'm playing around with the last one, this is where the problem arise. It's a table with two columns, user id & the 'set' that stores the enroll information. Since SQL doesn't have 'nice set data type' (MySQL has it but it's restricted to 64 entries, and it's difficult to use), I need to cover the problem from PHP side. > Most students will only enroll in a handful of courses. With this > approach, you only store a row in the Enrollment table if a student is > enrolled in a course. Otherwise, you don't need to store anything. Do you mean that, for example, if student A enrolls course x,y,z the table would be: user_id | course ---+--- A|x A|y A|z Students often forget (or are lazy) to unenroll after they've finished studying. Wouldn't the database be bloated? And what would be the primary key? > And no, 1.2MB is not that big of a deal. A single floppy high-density > drive could hold that much data 18 years ago! Yeah, but now I'm also considering the speed. So, I decided to build a class that implements bitset operations and I'm gonna store the contents in database as string. Here's the code: class bitset { private $bits; private function index($bit) { return $bit / 32; } private function offset($bit) { return $bit % 32; } private function binstr() { $temp_bits = $this->bits; $str = ""; for ($i=0;$i<32;$i++) { $str = strval($temp_bits & 1) . $str; $temp_bits >>= 1; } return $str; } public function __construct() { $this->bits = array(0,0,0,0,0,0,0,0); } public function tostring() { $str = $this->binstr($this->bits[0]); for ($i=1;$i<8;$i++) $str .= "," . $this->binstr($this->bits[$i]); return $str; } public function set($bit) { $index = $this->index($bit); $offset = $this->offset($bit); $this->bits[$index] |= 1 << $offset; } public function clear($bit) { $index = $this->index($bit); $offset = $this->offset($bit); $this->bits[$index] &= ~(1 << $offset); } public function test($bit) { $index = $this->index($bit); $offset = $this->offset($bit); return $this->bits[$index] & (1 << $offset); } } little problem: tostring() method always returns 00..001,00..001,.. regardless $bits contents. binstr() has been tested and works fine, what might be wrong? -- View this message in context: http://www.nabble.com/optimizing-space-for-array-of-booleans-tp22159131p22196477.html Sent from the PHP - General mailing list archive at Nabble.com. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] optimizing space for array of booleans
On Tue, Feb 24, 2009 at 3:24 AM, leledumbo wrote: > > Just tried serializing array of 256 booleans and printing the length, it > really shocked me: 2458. This project will be used by about 500 students, so > in the worst case (all students enroll all courses) it will eat 500 * 2458 > (assuming one character eats one byte) = 1229000 Bytes ~= 1.2 MB. Not a big > deal, eh? > That is a very un-normalized way of storing associations between entities in a database. You describe a "worst case scenario" where every student is registered for every course. If you serialize an array, every bit takes the same amount of space so it doesn't matter how many courses each student has. If you convert this to a bitmask as you are describing, your worst case would happen very frequently - at least as often as students are assigned to the course whose place is the most significant bit value. What's more, if you later need to add a 257th course, you could have lots of work to do to adjust your code to deal with the extra value. Generally relationships like the one you describe are stored in three separate and related tables: Students, Courses, and Enrollment. The latter is a n:m association between the first two. The advantage this approach has with regard to storage is that it is a sparse matrix. Most students will only enroll in a handful of courses. With this approach, you only store a row in the Enrollment table if a student is enrolled in a course. Otherwise, you don't need to store anything. Granted, if you use 4-byte integers for the primary keys in both the Students and Courses tables, your data storage would be about 2048 bytes for the data "worst case scenario" that you described, and another 2048 bytes for a unique index that would insure that each student could be enrolled in any course no more than one time. However, since I highly doubt most students are going to be enrolled in every course, it is not likely that you'll approach the 1.2MB storage space that you're worried about. Each row would take 16 bytes to store both the data and the index; if each student is enrolled in an average of 4 courses, it should only take an average of 64 bytes per student. For 500 students, that works out to around 2000 rows at about 32KB overall. For comparison, I have a fairly small table in a system we recently put into production that maintains similar associations. It stores more information in each row than just a two-column association (about 17 columns) and currently has over 7400 rows. The data is 0.734MB and the total size of three indexes is 0.820MB. And no, 1.2MB is not that big of a deal. A single floppy high-density drive could hold that much data 18 years ago! Andrew -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] optimizing space for array of booleans
Just tried serializing array of 256 booleans and printing the length, it really shocked me: 2458. This project will be used by about 500 students, so in the worst case (all students enroll all courses) it will eat 500 * 2458 (assuming one character eats one byte) = 1229000 Bytes ~= 1.2 MB. Not a big deal, eh? -- View this message in context: http://www.nabble.com/optimizing-space-for-array-of-booleans-tp22159131p22177808.html Sent from the PHP - General mailing list archive at Nabble.com. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] optimizing space for array of booleans
> you should not worry about optimizing boolean values unless you would store them in database. I DO need to store them in a database, that's why I'm asking. I won't care if I only use PHP, since it won't keep my data in memory for a long time. > anyways, use bitwise operators I think I've stated that I don't want to use bitwise operations. It's OK for small sets, but not for a set with about 200 elements. Anyway, I found bitset library in PECL, but I found no documentation at all. -- View this message in context: http://www.nabble.com/optimizing-space-for-array-of-booleans-tp22159131p22174206.html Sent from the PHP - General mailing list archive at Nabble.com. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] optimizing space for array of booleans
> Some languages allows to bit-pack structures to save spaces. Since PHP > doesn't have native set data type and operations, will array of booleans be > optimized as such? If not, how can I achieve the same result? I need to save > about 200 boolean values and I guess it's a good idea to use bitsets. > > Requirements: > - Easy to use (set, unset, test), so C way to use bitwise operations is > simply rejected. > I need something like Pascal's set. > - Small space (bit level if possible) > > -- > View this message in context: > http://www.nabble.com/optimizing-space-for-array-of-booleans-tp22159131p22159131.html > Sent from the PHP - General mailing list archive at Nabble.com. > hi, you should not worry about optimizing boolean values unless you would store them in database. anyways, use bitwise operators http://ph.php.net/language.operators.bitwise virgil http://www.jampmark.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
[PHP] optimizing space for array of booleans
Some languages allows to bit-pack structures to save spaces. Since PHP doesn't have native set data type and operations, will array of booleans be optimized as such? If not, how can I achieve the same result? I need to save about 200 boolean values and I guess it's a good idea to use bitsets. Requirements: - Easy to use (set, unset, test), so C way to use bitwise operations is simply rejected. I need something like Pascal's set. - Small space (bit level if possible) -- View this message in context: http://www.nabble.com/optimizing-space-for-array-of-booleans-tp22159131p22159131.html Sent from the PHP - General mailing list archive at Nabble.com. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php