Re: [PHP] optimizing space for array of booleans

2009-02-25 Thread leledumbo

 but you're trying to pass stuff to it:
 
public function tostring() {
  $str = $this-binstr($this-bits[0]);
  for ($i=1;$i8;$i++)
$str .= , . $this-binstr($this-bits[$i]);
  return $str;
}

Slap (on my face)! My stupidity... I come from a strongly typed language
background and dynamically typed language often makes me less careful.
Thanks, good job =).
-- 
View this message in context: 
http://www.nabble.com/optimizing-space-for-array-of-booleans-tp22159131p22198832.html
Sent from the PHP - General mailing list archive at Nabble.com.


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] optimizing space for array of booleans

2009-02-24 Thread leledumbo

Just tried serializing array of 256 booleans and printing the length, it
really shocked me: 2458. This project will be used by about 500 students, so
in the worst case (all students enroll all courses) it will eat 500 * 2458
(assuming one character eats one byte) = 1229000 Bytes ~= 1.2 MB. Not a big
deal, eh?

-- 
View this message in context: 
http://www.nabble.com/optimizing-space-for-array-of-booleans-tp22159131p22177808.html
Sent from the PHP - General mailing list archive at Nabble.com.


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] optimizing space for array of booleans

2009-02-24 Thread Andrew Ballard
On Tue, Feb 24, 2009 at 3:24 AM, leledumbo leledumbo_c...@yahoo.co.id wrote:

 Just tried serializing array of 256 booleans and printing the length, it
 really shocked me: 2458. This project will be used by about 500 students, so
 in the worst case (all students enroll all courses) it will eat 500 * 2458
 (assuming one character eats one byte) = 1229000 Bytes ~= 1.2 MB. Not a big
 deal, eh?


That is a very un-normalized way of storing associations between
entities in a database. You describe a worst case scenario where
every student is registered for every course. If you serialize an
array, every bit takes the same amount of space so it doesn't matter
how many courses each student has. If you convert this to a bitmask as
you are describing, your worst case would happen very frequently - at
least as often as students are assigned to the course whose place is
the most significant bit value. What's more, if you later need to add
a 257th course, you could have lots of work to do to adjust your code
to deal with the extra value.

Generally relationships like the one you describe are stored in three
separate and related tables: Students, Courses, and Enrollment. The
latter is a n:m association between the first two. The advantage this
approach has with regard to storage is that it is a sparse matrix.
Most students will only enroll in a handful of courses. With this
approach, you only store a row in the Enrollment table if a student is
enrolled in a course. Otherwise, you don't need to store anything.
Granted, if you use 4-byte integers for the primary keys in both the
Students and Courses tables, your data storage would be about 2048
bytes for the data worst case scenario that you described, and
another 2048 bytes for a unique index that would insure that each
student could be enrolled in any course no more than one time.
However, since I highly doubt most students are going to be enrolled
in every course, it is not likely that you'll approach the 1.2MB
storage space that you're worried about. Each row would take 16 bytes
to store both the data and the index; if each student is enrolled in
an average of 4 courses, it should only take an average of 64 bytes
per student. For 500 students, that works out to around 2000 rows at
about 32KB overall.

For comparison, I have a fairly small table in a system we recently
put into production that maintains similar associations. It stores
more information in each row than just a two-column association (about
17 columns) and currently has over 7400 rows. The data is 0.734MB and
the total size of three indexes is 0.820MB.

And no, 1.2MB is not that big of a deal. A single floppy high-density
drive could hold that much data 18 years ago!

Andrew

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] optimizing space for array of booleans

2009-02-24 Thread leledumbo

 Generally relationships like the one you describe are stored in three
 separate and related tables: Students, Courses, and Enrollment. The
 latter is a n:m association between the first two. The advantage this
 approach has with regard to storage is that it is a sparse matrix.

I've done that and I'm playing around with the last one, this is where the
problem arise. It's a table with two columns, user id  the 'set' that
stores the enroll information. Since SQL doesn't have 'nice set data type'
(MySQL has it but it's restricted to 64 entries, and it's difficult to use),
I need to cover the problem from PHP side.

 Most students will only enroll in a handful of courses. With this
 approach, you only store a row in the Enrollment table if a student is
 enrolled in a course. Otherwise, you don't need to store anything.

Do you mean that, for example, if student A enrolls course x,y,z the table
would be:

user_id | course
---+---
A|x
A|y
A|z

Students often forget (or are lazy) to unenroll after they've finished
studying. Wouldn't the database be bloated? And what would be the primary
key?

 And no, 1.2MB is not that big of a deal. A single floppy high-density
 drive could hold that much data 18 years ago!

Yeah, but now I'm also considering the speed. So, I decided to build a class
that implements bitset operations and I'm gonna store the contents in
database as string. Here's the code: 

class bitset {
  private $bits;

  private function index($bit) { return $bit / 32; }
  private function offset($bit) { return $bit % 32; }

  private function binstr() {
$temp_bits = $this-bits;
$str = ;
for ($i=0;$i32;$i++) {
  $str = strval($temp_bits  1) . $str;
  $temp_bits = 1;
}
return $str;
  }

  public function __construct() {
$this-bits = array(0,0,0,0,0,0,0,0);
  }

  public function tostring() {
$str = $this-binstr($this-bits[0]);
for ($i=1;$i8;$i++)
  $str .= , . $this-binstr($this-bits[$i]);
return $str;
  }

  public function set($bit) {
$index  = $this-index($bit);
$offset = $this-offset($bit);
$this-bits[$index] |= 1  $offset;
  }

  public function clear($bit) {
$index  = $this-index($bit);
$offset = $this-offset($bit);
$this-bits[$index] = ~(1  $offset);
  }

  public function test($bit) {
$index  = $this-index($bit);
$offset = $this-offset($bit);
return $this-bits[$index]  (1  $offset);
  }
}

little problem: tostring() method always returns 00..001,00..001,..
regardless $bits contents. binstr() has been tested and works fine, what
might be wrong?



-- 
View this message in context: 
http://www.nabble.com/optimizing-space-for-array-of-booleans-tp22159131p22196477.html
Sent from the PHP - General mailing list archive at Nabble.com.


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] optimizing space for array of booleans

2009-02-24 Thread Andrew Ballard
On Wed, Feb 25, 2009 at 12:12 AM, leledumbo leledumbo_c...@yahoo.co.id wrote:

 Generally relationships like the one you describe are stored in three
 separate and related tables: Students, Courses, and Enrollment. The
 latter is a n:m association between the first two. The advantage this
 approach has with regard to storage is that it is a sparse matrix.

 I've done that and I'm playing around with the last one, this is where the
 problem arise. It's a table with two columns, user id  the 'set' that
 stores the enroll information. Since SQL doesn't have 'nice set data type'
 (MySQL has it but it's restricted to 64 entries, and it's difficult to use),
 I need to cover the problem from PHP side.

That's because AFAIK MySQL implements the internal storage for a SET
or ENUM column as a 4-byte unsigned value. A lot of people like MySQL
because it offers those two data types, but this is the limitation. If
you want to add more options to the list, you have to figure out a way
to extend the functionality manually.

 Most students will only enroll in a handful of courses. With this
 approach, you only store a row in the Enrollment table if a student is
 enrolled in a course. Otherwise, you don't need to store anything.

 Do you mean that, for example, if student A enrolls course x,y,z the table
 would be:

 user_id | course
 ---+---
    A    |    x
    A    |    y
    A    |    z

Exactly.


 Students often forget (or are lazy) to unenroll after they've finished
 studying. Wouldn't the database be bloated?

That would be up to you and how much you need to keep in the table.
You can add other columns as needed to help maintain the table. For
instance, you could put start and end dates in the relation table.
Then you could either leave the rows in there and maintain a history
of when students took specific courses, or use those dates to delete
associations that are older than you need to store in your
application.

And believe me - I'd rather deal with the bloat than with the overhead
I expect you'll see with this custom bitmask solution you are trying
to build.

 And what would be the primary key?

The primary key is usually a two-column key. (In some cases such as
when you allow a user to retake the same course during multiple date
periods, you might have to toss some additional columns into the
primary key.) In the example above, the primary key would be on
(user_id, course). The benefit you get, other than speeding up joins
to this table, is that it prevents user_id A from being enrolled in
course x more than once.

I also usually add another index on just course. This uses a little
more storage space, but I can query the table efficiently from either
side of the relationship.


 And no, 1.2MB is not that big of a deal. A single floppy high-density
 drive could hold that much data 18 years ago!

 Yeah, but now I'm also considering the speed. So, I decided to build a class
 that implements bitset operations and I'm gonna store the contents in
 database as string. Here's the code:

Try what you will, I doubt you'll beat a well-designed relational
database for speed when it comes to problems like this. For example,
how do you plan to optimize a query to find all students whose 23rd
bit is 1 or whose 2nd bit is 0? Even with an integer column, that type
of query doesn't use an index because it can't be answered by finding
a value within a range of values. Instead, it has to perform a bitwise
operation on every value in the column to see if the row matches.
That's a table scan, and they are not very efficient in a database.
They are even less efficient if you have to fetch the entire table
into PHP and perform a manual table scan there. And now that you're
looking at using strings to overcome the limitation of 64 values in a
SET column, the problem gets even more expensive to solve.

 class bitset {
  private $bits;

  private function index($bit) { return $bit / 32; }
  private function offset($bit) { return $bit % 32; }

  private function binstr() {
    $temp_bits = $this-bits;
    $str = ;
    for ($i=0;$i32;$i++) {
      $str = strval($temp_bits  1) . $str;
      $temp_bits = 1;
    }
    return $str;
  }

  public function __construct() {
    $this-bits = array(0,0,0,0,0,0,0,0);
  }

  public function tostring() {
    $str = $this-binstr($this-bits[0]);
    for ($i=1;$i8;$i++)
      $str .= , . $this-binstr($this-bits[$i]);
    return $str;
  }

  public function set($bit) {
    $index  = $this-index($bit);
    $offset = $this-offset($bit);
    $this-bits[$index] |= 1  $offset;
  }

  public function clear($bit) {
    $index  = $this-index($bit);
    $offset = $this-offset($bit);
    $this-bits[$index] = ~(1  $offset);
  }

  public function test($bit) {
    $index  = $this-index($bit);
    $offset = $this-offset($bit);
    return $this-bits[$index]  (1  $offset);
  }
 }

 little problem: tostring() method always returns 00..001,00..001,..
 regardless $bits contents. binstr() has been tested and works 

Re: [PHP] optimizing space for array of booleans

2009-02-24 Thread leledumbo

Good points, I'll try it.

 Without testing it (it's late here), your binstr() function doesn't
 accept parameters, so it would always return the same result each time
 it's called, regardless of what you pass into it.

In case you want to check it tomorrow or later:

private function binstr() {
  $temp_bits = $this-bits;
  $str = ;
  for ($i=0;$i32;$i++) {
$str = strval($temp_bits  1) . $str;
$temp_bits = 1;
  }
  return $str;
}

it doesn't accept parameters, but instead use private field $bits assigned
to $temp_bits (PHP manual states that it will be copied instead of
referenced, and it's exactly what I need).
-- 
View this message in context: 
http://www.nabble.com/optimizing-space-for-array-of-booleans-tp22159131p22196899.html
Sent from the PHP - General mailing list archive at Nabble.com.


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] optimizing space for array of booleans

2009-02-24 Thread Chris

leledumbo wrote:

Good points, I'll try it.


Without testing it (it's late here), your binstr() function doesn't
accept parameters, so it would always return the same result each time
it's called, regardless of what you pass into it.


In case you want to check it tomorrow or later:

private function binstr() {
  $temp_bits = $this-bits;
  $str = ;
  for ($i=0;$i32;$i++) {
$str = strval($temp_bits  1) . $str;
$temp_bits = 1;
  }
  return $str;
}

it doesn't accept parameters, but instead use private field $bits assigned
to $temp_bits (PHP manual states that it will be copied instead of
referenced, and it's exactly what I need).


but you're trying to pass stuff to it:

  public function tostring() {
$str = $this-binstr($this-bits[0]);
for ($i=1;$i8;$i++)
  $str .= , . $this-binstr($this-bits[$i]);
return $str;
  }


--
Postgresql  php tutorials
http://www.designmagick.com/


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] optimizing space for array of booleans

2009-02-23 Thread Virgilio Quilario
 Some languages allows to bit-pack structures to save spaces. Since PHP
 doesn't have native set data type and operations, will array of booleans be
 optimized as such? If not, how can I achieve the same result? I need to save
 about 200 boolean values and I guess it's a good idea to use bitsets.

 Requirements:
 - Easy to use (set, unset, test), so C way to use bitwise operations is
 simply rejected.
  I need something like Pascal's set.
 - Small space (bit level if possible)

 --
 View this message in context: 
 http://www.nabble.com/optimizing-space-for-array-of-booleans-tp22159131p22159131.html
 Sent from the PHP - General mailing list archive at Nabble.com.


hi,

you should not worry about optimizing boolean values unless you would
store them in database.
anyways, use bitwise operators http://ph.php.net/language.operators.bitwise

virgil
http://www.jampmark.com

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] optimizing space for array of booleans

2009-02-23 Thread leledumbo

 you should not worry about optimizing boolean values unless you would store
them in database.
I DO need to store them in a database, that's why I'm asking. I won't care
if I only use PHP, since it won't keep my data in memory for a long time.

 anyways, use bitwise operators
I think I've stated that I don't want to use bitwise operations. It's OK for
small sets, but not for a set with about 200 elements. Anyway, I found
bitset library in PECL, but I found no documentation at all.
-- 
View this message in context: 
http://www.nabble.com/optimizing-space-for-array-of-booleans-tp22159131p22174206.html
Sent from the PHP - General mailing list archive at Nabble.com.


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php