Re: Phoenix Mapreduce

2019-04-30 Thread Shawn Li
Got it, thanks for the clarification, Josh!

On Tue, Apr 30, 2019, 10:34 AM Josh Elser  wrote:

> No, you will not "lose" data. You will just have mappers that read from
> more than one Region (and thus, more than one RegionServer). The hope in
> this approach is that we can launch Mappers on the same node of the
> RegionServer hosting your Region and avoid any reading any data over the
> network.
>
> This is just an optimization.
>
> On 4/30/19 10:12 AM, Shawn Li wrote:
> > Hi,
> >
> > The number of Map in Phoenix Mapreduce is determined by table region
> > number. My question is: if the region is split due to other injection
> > process while Phoenix Mapreduce job is running, do we lose reading some
> > data due to this split? As now we have more regions than maps, and the
> > maps only have region information before split.
> >
> > Thanks,
> > Shawn
>


Re: Phoenix Mapreduce

2019-04-30 Thread Josh Elser
No, you will not "lose" data. You will just have mappers that read from 
more than one Region (and thus, more than one RegionServer). The hope in 
this approach is that we can launch Mappers on the same node of the 
RegionServer hosting your Region and avoid any reading any data over the 
network.


This is just an optimization.

On 4/30/19 10:12 AM, Shawn Li wrote:

Hi,

The number of Map in Phoenix Mapreduce is determined by table region 
number. My question is: if the region is split due to other injection 
process while Phoenix Mapreduce job is running, do we lose reading some 
data due to this split? As now we have more regions than maps, and the 
maps only have region information before split.


Thanks,
Shawn


Phoenix Mapreduce

2019-04-30 Thread Shawn Li
Hi,

The number of Map in Phoenix Mapreduce is determined by table region
number. My question is: if the region is split due to other injection
process while Phoenix Mapreduce job is running, do we lose reading some
data due to this split? As now we have more regions than maps, and the maps
only have region information before split.

Thanks,
Shawn


Re: Phoenix Mapreduce

2017-12-29 Thread Josh Elser

Hey Anil,

Check out the MultiHfileOutputFormat class.

You can see how AbstractBulkLoadTool invokes it inside the `submitJob` 
method.


On 12/28/17 5:33 AM, Anil wrote:

HI Team,

I was looking at the PhoenixOutputFormat and PhoenixRecordWriter.java , 
could not see connection autocommit is set to false. Did i miss 
something here ?


Is there any way to read from phoenix table and create HFiles for bulk 
import instead of committing every record (batch).


I have written a mapreduce job to create a datasets for my target table 
and data load to target table is taking long time and want to avoid load 
time by avoiding statement execution or frequent commits.


Any help would be appreciated. thanks.

Thanks,
Anil




Phoenix Mapreduce

2017-12-28 Thread Anil
HI Team,

I was looking at the PhoenixOutputFormat and PhoenixRecordWriter.java ,
could not see connection autocommit is set to false. Did i miss something
here ?

Is there any way to read from phoenix table and create HFiles for bulk
import instead of committing every record (batch).

I have written a mapreduce job to create a datasets for my target table and
data load to target table is taking long time and want to avoid load time
by avoiding statement execution or frequent commits.

Any help would be appreciated. thanks.

Thanks,
Anil


Re: MultipleInput in Phoenix mapreduce job

2017-03-24 Thread Steve Terrell
I have been using https://phoenix.apache.org/pig_integration.html for years
with much success.

Hope this helps,
Steve

On Fri, Mar 24, 2017 at 7:40 AM, Anil  wrote:

> Hi,
>
> I have two table called PERSON and PERSON_DETAIL. i need to populate the
> of the person Detail info into Person record.
>
> Does phoenix map reduce support Multiple mappers from multiple tables
> through MultipleInput ?
>
> Currently i am populating consolidated details information into a
> temporary table and each execute a sql query to get the details info to
> populate into Person record in Person table.
>
> and this approach is taking little more time.
>
> So you suggest any better approach ?
>
> Thanks
>
>


MultipleInput in Phoenix mapreduce job

2017-03-24 Thread Anil
Hi,

I have two table called PERSON and PERSON_DETAIL. i need to populate the of
the person Detail info into Person record.

Does phoenix map reduce support Multiple mappers from multiple tables
through MultipleInput ?

Currently i am populating consolidated details information into a temporary
table and each execute a sql query to get the details info to populate into
Person record in Person table.

and this approach is taking little more time.

So you suggest any better approach ?

Thanks


Phoenix mapreduce

2017-01-31 Thread Anil
Hello,

I have phoenix table which have both child and parent records.
now i have created a phoenix mapreduce job to populate few columns of
parent record into child record.

Two ways of populating parent columns into child record are

1.
a. Get the parent columns information by phoenix query for each child
record in mapper.
b. Set number of reducers to zero

2.
a. Group by the records by parent id (which is available in both parent and
child records). it mean use parent id as key of mapper output and record as
value of mapper output
b. populate parent coulmns information into child record in reducer.

I tried #1 and always see container memory insufficient error or GC
overhead error.

What is the recommended approach ? thanks for your help.

Thanks.