As I said before, I'm not very familiar with the API for scans/filters/etc.

If you are not worried about realtime access to that query, then you could run 
a map reduce job that takes in all rows, you could validate whether 
"Courses:Maths" exists in each row or not. If it exists "context.write("Maths", 
1);" and then reduce it overall to accumulate a total.

Even better since you'd be running a mapreduce, for each course: 
"context.write(course, 1);" and reduce the overall output with the course name 
as the key, this will give you the total number in each course as a sorted list.

If you want realtime access, then potentially explore a secondary table as an 
index, which is kept up to date by the inserting application, or by a scheduled 
map/reduce.

Thanks,

Travis Hegner
http://www.travishegner.com/

-----Original Message-----
From: SyedShoaib [mailto:[email protected]]
Sent: Friday, June 25, 2010 5:44 AM
To: [email protected]
Subject: RE: How to search and make indexes in ColumnFamilies with unknown 
columns ?


Thank you very much for your help. If we keep courses as columns, the problem
remains the same. Actually, the number of columns are unknown. There can be
1000 subjects in one row. There may be only two subjects in another row.
These subjects are unknown to us while we are programming through client
API. The user will insert them on runtime. Now how a Filter in Client API
will search a particular course in all columns of a ColumnFamily? All the
filters I have explored search only in a single column of a ColumnFamily at
one time. Thats the real problem.

Many thanks for the help again.
regards,



Hegner, Travis wrote:
>
> I'm not an expert by any means, but I wonder if you were to store the
> course name/type as the column name, and some arbitrary but useful value
> as the value, for example:
>
> Student_Courses  // Table Name
> {
>      Student:   // Column Family
>      {
>           ID => 12345678
>           Name => John Smith
>      }
>
>      Courses:   // Column Family with any number of columns:
>      {
>          Maths => 2010_Fall
>          Computer => 2011_Spring
>          .
>          .
>          Science => 2011_Spring
>      }
> }
>
> The API may be better suited to handle filtering by column name, rather
> than value, but as I said, I'm no expert, and I have very little
> experience filtering via the API.
>
> Assuming the filter works correctly, you could simply ignore the value
> retrieved if it wasn't needed. Be careful about putting too large of a
> value in though, as that could affect performance. This is one of the
> beauties of a column oriented schema, you can store useful, valuable
> information as a column name.
>
> I do know that with this type of schema, the columns would be accessed
> like:
>
> get(<row_id>, "Courses:Maths"[, <version>]);
>
> or something to that effect anyway...
>
> Hope This Helps, Good Luck!
>
> Travis Hegner
> http://www.travishegner.com/
>
> -----Original Message-----
> From: SyedShoaib [mailto:[email protected]]
> Sent: Thursday, June 24, 2010 8:26 AM
> To: [email protected]
> Subject: How to search and make indexes in ColumnFamilies with unknown
> columns ?
>
>
> Hi,
>
> I am new to HBase and have just worked on it for few days. I have two
> questions. Any kind of help is fully appreciated and many thanks in
> advance.
>
> 1) Suppose I have a columnFamily with unknown number of columns. I want to
> search a value in this columnFamily. That value can be present in any
> column
> of this columnFamily. How will I search a value in whole columnFamily? For
> further elaboration please consider a simple scenario:
>
> For example: A student can have any number of courses. Schema in HBase
> could
> be:
>
> Student_Courses  // Table Name
> {
>      Student:   // Column Family
>      {
>           ID:
>           Name:
>      }
>
>      Courses:   // Column Family with any number of columns:
>      {
>          Course_1:  Maths
>          Course_2:  Computer
>          .
>          .
>          Course_n:  Science
>      }
> }
>
> If I want to search all rows with a value “Maths” in any of the column
> inside columnFamily “Course:” what will I do ? I can search for any value
> through SingleColumnValueFilter  by mentioning ColumnFamily and Prefix
> e.g.
> "Student:Name". But how will I search a value in "Course:" columnFamily
> keeping the fact in mind that I dont know how many columns I have in it.
>
>
> 2) How will I make an index on this columnFamily (“Course:”) ? I know
> indexes are made on columns but the columns are unknown in number!  I can
> make an index on "Student:Name". But what to do if I want to make a single
> index on complete “Courses:” ColumnFamily? Is it possible? It will help me
> a
> lot during a search like SHOW ME ALL THE STUDENTS REGISTERED IN MATHS.
>
> Regards,
>
> --
> View this message in context:
> http://old.nabble.com/How-to-search-and-make-indexes-in-ColumnFamilies-with-unknown-columns---tp28981932p28981932.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>
> The information contained in this communication is confidential and is
> intended only for the use of the named recipient.  Unauthorized use,
> disclosure, or copying is strictly prohibited and may be unlawful.  If you
> have received this communication in error, you should know that you are
> bound to confidentiality, and should please immediately notify the sender
> or our IT Department at  866.459.4599.
>
>

--
View this message in context: 
http://old.nabble.com/How-to-search-and-make-indexes-in-ColumnFamilies-with-unknown-columns---tp28981932p28990537.html
Sent from the HBase User mailing list archive at Nabble.com.


The information contained in this communication is confidential and is intended 
only for the use of the named recipient.  Unauthorized use, disclosure, or 
copying is strictly prohibited and may be unlawful.  If you have received this 
communication in error, you should know that you are bound to confidentiality, 
and should please immediately notify the sender or our IT Department at  
866.459.4599.

Reply via email to