Re: Phoenix Mapreduce
Got it, thanks for the clarification, Josh! On Tue, Apr 30, 2019, 10:34 AM Josh Elser wrote: > No, you will not "lose" data. You will just have mappers that read from > more than one Region (and thus, more than one RegionServer). The hope in > this approach is that we can launch Mappers on the same node of the > RegionServer hosting your Region and avoid any reading any data over the > network. > > This is just an optimization. > > On 4/30/19 10:12 AM, Shawn Li wrote: > > Hi, > > > > The number of Map in Phoenix Mapreduce is determined by table region > > number. My question is: if the region is split due to other injection > > process while Phoenix Mapreduce job is running, do we lose reading some > > data due to this split? As now we have more regions than maps, and the > > maps only have region information before split. > > > > Thanks, > > Shawn >
Re: Phoenix Mapreduce
No, you will not "lose" data. You will just have mappers that read from more than one Region (and thus, more than one RegionServer). The hope in this approach is that we can launch Mappers on the same node of the RegionServer hosting your Region and avoid any reading any data over the network. This is just an optimization. On 4/30/19 10:12 AM, Shawn Li wrote: Hi, The number of Map in Phoenix Mapreduce is determined by table region number. My question is: if the region is split due to other injection process while Phoenix Mapreduce job is running, do we lose reading some data due to this split? As now we have more regions than maps, and the maps only have region information before split. Thanks, Shawn
Phoenix Mapreduce
Hi, The number of Map in Phoenix Mapreduce is determined by table region number. My question is: if the region is split due to other injection process while Phoenix Mapreduce job is running, do we lose reading some data due to this split? As now we have more regions than maps, and the maps only have region information before split. Thanks, Shawn
Re: Phoenix Mapreduce
Hey Anil, Check out the MultiHfileOutputFormat class. You can see how AbstractBulkLoadTool invokes it inside the `submitJob` method. On 12/28/17 5:33 AM, Anil wrote: HI Team, I was looking at the PhoenixOutputFormat and PhoenixRecordWriter.java , could not see connection autocommit is set to false. Did i miss something here ? Is there any way to read from phoenix table and create HFiles for bulk import instead of committing every record (batch). I have written a mapreduce job to create a datasets for my target table and data load to target table is taking long time and want to avoid load time by avoiding statement execution or frequent commits. Any help would be appreciated. thanks. Thanks, Anil
Phoenix Mapreduce
HI Team, I was looking at the PhoenixOutputFormat and PhoenixRecordWriter.java , could not see connection autocommit is set to false. Did i miss something here ? Is there any way to read from phoenix table and create HFiles for bulk import instead of committing every record (batch). I have written a mapreduce job to create a datasets for my target table and data load to target table is taking long time and want to avoid load time by avoiding statement execution or frequent commits. Any help would be appreciated. thanks. Thanks, Anil
Re: MultipleInput in Phoenix mapreduce job
I have been using https://phoenix.apache.org/pig_integration.html for years with much success. Hope this helps, Steve On Fri, Mar 24, 2017 at 7:40 AM, Anil wrote: > Hi, > > I have two table called PERSON and PERSON_DETAIL. i need to populate the > of the person Detail info into Person record. > > Does phoenix map reduce support Multiple mappers from multiple tables > through MultipleInput ? > > Currently i am populating consolidated details information into a > temporary table and each execute a sql query to get the details info to > populate into Person record in Person table. > > and this approach is taking little more time. > > So you suggest any better approach ? > > Thanks > >
MultipleInput in Phoenix mapreduce job
Hi, I have two table called PERSON and PERSON_DETAIL. i need to populate the of the person Detail info into Person record. Does phoenix map reduce support Multiple mappers from multiple tables through MultipleInput ? Currently i am populating consolidated details information into a temporary table and each execute a sql query to get the details info to populate into Person record in Person table. and this approach is taking little more time. So you suggest any better approach ? Thanks
Phoenix mapreduce
Hello, I have phoenix table which have both child and parent records. now i have created a phoenix mapreduce job to populate few columns of parent record into child record. Two ways of populating parent columns into child record are 1. a. Get the parent columns information by phoenix query for each child record in mapper. b. Set number of reducers to zero 2. a. Group by the records by parent id (which is available in both parent and child records). it mean use parent id as key of mapper output and record as value of mapper output b. populate parent coulmns information into child record in reducer. I tried #1 and always see container memory insufficient error or GC overhead error. What is the recommended approach ? thanks for your help. Thanks.