[jira] [Commented] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2012-04-05 Thread Eran Kutner (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247444#comment-13247444
 ] 

Eran Kutner commented on HBASE-3996:


@stack: I believe the only open issue in the review board is your suggestion to 
replace my MultiTableInputCollection with a ListScan. Although I agree it 
would make the patch simpler and allow it to have one less class, I think it 
will make using it less natural. Developers will have to create a Scan which is 
a common object and then set a table attribute. This feels less natural to me 
than setting the table by adding to a collection the way I've done it, but I 
guess it's a matter of perspective.


 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
Assignee: Eran Kutner
 Fix For: 0.96.0

 Attachments: 3996-v2.txt, 3996-v3.txt, 3996-v4.txt, 3996-v5.txt, 
 3996-v6.txt, 3996-v7.txt, HBase-3996.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2012-04-05 Thread Eran Kutner (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247448#comment-13247448
 ] 

Eran Kutner commented on HBASE-3996:


Just to give better reasoning why I feel it is unnatural. With my method 
someone using this functionality for the first time would be able to figure it 
out just by looking at the class names and interface definitions (using IDE 
auto completion for example), while the only way to know it is required to set 
that attribute is to dig in the documentation.

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
Assignee: Eran Kutner
 Fix For: 0.96.0

 Attachments: 3996-v2.txt, 3996-v3.txt, 3996-v4.txt, 3996-v5.txt, 
 3996-v6.txt, 3996-v7.txt, HBase-3996.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2012-03-28 Thread Eran Kutner (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240335#comment-13240335
 ] 

Eran Kutner commented on HBASE-3996:


There is one pending change I know about, and that is making TableInputConf a 
static inner class. As for versionning  I'll look at it but can't say when.
Other than that I'm waiting to hear back from @Lars regarding my response to 
his suggestions on reusing TableInputFormatBase.

Sorry for being slow to respond, I'm very busy with other things these days, so 
feel free to make any changes you feel are right.

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
Assignee: Eran Kutner
 Fix For: 0.96.0

 Attachments: 3996-v2.txt, 3996-v3.txt, 3996-v4.txt, 3996-v5.txt, 
 HBase-3996.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2012-03-21 Thread Eran Kutner (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235029#comment-13235029
 ] 

Eran Kutner commented on HBASE-3996:


Made some changes following @stack review. DOn't know how to submit for review 
again.

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
Assignee: Eran Kutner
 Fix For: 0.96.0

 Attachments: 3996-v2.txt, 3996-v3.txt, 3996-v4.txt, 3996-v5.txt, 
 HBase-3996.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2012-03-20 Thread Eran Kutner (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233274#comment-13233274
 ] 

Eran Kutner commented on HBASE-3996:


Sorry for missing all the action, I was offline for a couple of days.
Thanks Ted and everyone else for pushing this forward.

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
Assignee: Eran Kutner
 Fix For: 0.94.0, 0.96.0

 Attachments: 3996-v2.txt, 3996-v3.txt, 3996-v4.txt, HBase-3996.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2012-02-16 Thread Eran Kutner (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209228#comment-13209228
 ] 

Eran Kutner commented on HBASE-3996:


It was merging fine when I posted it about 7 months ago. I assume a lot has 
changed in TRUNK since.
I'll take a look at it but can't promise a ETA.

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.94.0

 Attachments: MultiTableInputFormat.patch, 
 TestMultiTableInputFormat.java.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2012-02-16 Thread Eran Kutner (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209310#comment-13209310
 ] 

Eran Kutner commented on HBASE-3996:


I now remember this was a patch file I tried to manipulate manually to reduce 
some extra stuff that was included and Stack didn't like. 
I regenerated the patch file from TRUNK, but it still have some unnecessary 
stuff in it.

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
 Fix For: 0.94.0

 Attachments: HBase-3996.patch, MultiTableInputFormat.patch, 
 TestMultiTableInputFormat.java.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4612) Allow ColumnPrefixFilter to support multiple prefixes

2011-10-23 Thread Eran Kutner (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133586#comment-13133586
 ] 

Eran Kutner commented on HBASE-4612:


OK, I uploaded a patch for trunk, hopefully what I've done with the 
createFilterFromArguments method makes sense.

 Allow ColumnPrefixFilter to support multiple prefixes
 -

 Key: HBASE-4612
 URL: https://issues.apache.org/jira/browse/HBASE-4612
 Project: HBase
  Issue Type: Improvement
  Components: filters
Affects Versions: 0.90.4
Reporter: Eran Kutner
Assignee: Eran Kutner
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-4612-0.90.patch, HBASE-4612.patch


 When having a lot of columns grouped by name I've found that it would be very 
 useful to be able to scan them using multiple prefixes, allowing to fetch 
 specific groups in one scan, without fetching the entire row. This is 
 impossible to achieve using a FilterList, so I've added such support to the 
 existing ColmnPrefixFilter while keeping backward compatibility.
 The attached patch is based on 0.90.4, I noticed that the 0.92 branch has a 
 new method to support instantiating filters using Thrift. I'm not sure how 
 the serialization works there so I didn't implement that, but the rest of my 
 code should work in 0.92 as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4612) Allow ColumnPrefixFilter to support multiple prefixes

2011-10-18 Thread Eran Kutner (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13130020#comment-13130020
 ] 

Eran Kutner commented on HBASE-4612:


Hi Jonathan, thanks for the feedback! See answers inline:

{quote}There's no explanation of the behavior anywhere. In the constructors and 
addPrefix() methods, you should document that this creates an OR condition 
across all of the prefixes, correct?{quote} - good point, added some more 
explanations.
{quote}No need to instantiate a new comparator all the time (use 
Bytes.BYTES_COMPARATOR){quote} - Didn't know it existed. Changed.
{quote}Something seems odd when you keep adding to the end of a List and then 
sort. How about a TreeSet? You can easily ignore dupes that way.{quote} - This 
is intentional. Sorting is done only during initialization but accessing a 
ArrayList, which is actually based on an array, is much more efficient than 
accessing a tree, so I sacrifice the aesthetics of the code for better runtime 
performance.
{quote}There's no input verification so, for example, you could pass a null to 
the constructor or an empty byte[][] and have some strange behavior. Like it 
will instantiate okay but then you'll get server-side NPEs or IOOB.{quote} - 
it's a good point but I've looked and no other filter is validating its input 
either. I can throw a InvalidArgumentException but don't know if it's a good 
idea considering it's not the norm.
{quote}this.prefixes.size() == 0 - this.prefixes.isEmpty(){quote} - ok, 
changed.
{quote}your comment at the top of filterColumn, i wouldn't exactly call it a 
workaround, but it's a good comment. looking at the logic, it seems like 
correct behavior would be that it can be called with current == size() but it 
would be a bug if current  size(), right? should you add an assert or throw an 
exception?{quote} - well it is kind of a workaround, because as an individual 
filter I expect not be called again after returning NEXT_ROW, however, when 
used with FilterList the filter does get called again which puts it in an 
ilegal state, so it has to explicitly handle that case. That is also why it 
can't throw an exception in that scenario, because it seems to be happening 
normally when used with FilterList. as for current it has to be smaller than 
size() or it would be outside the bounds of the array.



 Allow ColumnPrefixFilter to support multiple prefixes
 -

 Key: HBASE-4612
 URL: https://issues.apache.org/jira/browse/HBASE-4612
 Project: HBase
  Issue Type: Improvement
  Components: filters
Affects Versions: 0.90.4
Reporter: Eran Kutner
Assignee: Eran Kutner
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-4612-0.90.patch


 When having a lot of columns grouped by name I've found that it would be very 
 useful to be able to scan them using multiple prefixes, allowing to fetch 
 specific groups in one scan, without fetching the entire row. This is 
 impossible to achieve using a FilterList, so I've added such support to the 
 existing ColmnPrefixFilter while keeping backward compatibility.
 The attached patch is based on 0.90.4, I noticed that the 0.92 branch has a 
 new method to support instantiating filters using Thrift. I'm not sure how 
 the serialization works there so I didn't implement that, but the rest of my 
 code should work in 0.92 as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4612) Allow ColumnPrefixFilter to support multiple prefixes

2011-10-18 Thread Eran Kutner (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13130023#comment-13130023
 ] 

Eran Kutner commented on HBASE-4612:


@Ted:
{quote}Improvements go to TRUNK.{quote}
I know but see my initial comment regarding the new Thrift initialization 
method, I'm just not sure how it's supposed to work or what am I supposed to do 
there.

 Allow ColumnPrefixFilter to support multiple prefixes
 -

 Key: HBASE-4612
 URL: https://issues.apache.org/jira/browse/HBASE-4612
 Project: HBase
  Issue Type: Improvement
  Components: filters
Affects Versions: 0.90.4
Reporter: Eran Kutner
Assignee: Eran Kutner
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-4612-0.90.patch


 When having a lot of columns grouped by name I've found that it would be very 
 useful to be able to scan them using multiple prefixes, allowing to fetch 
 specific groups in one scan, without fetching the entire row. This is 
 impossible to achieve using a FilterList, so I've added such support to the 
 existing ColmnPrefixFilter while keeping backward compatibility.
 The attached patch is based on 0.90.4, I noticed that the 0.92 branch has a 
 new method to support instantiating filters using Thrift. I'm not sure how 
 the serialization works there so I didn't implement that, but the rest of my 
 code should work in 0.92 as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira