Re: Bulk loading through HFiles

Yiannis Gkoufas Tue, 16 Jun 2015 06:12:32 -0700

Hi,

didn't realize that I only sent to Dawid.
Resending to the entire list in case someone else has encountered this
error before:


15/06/10 23:45:16 WARN TaskSetManager: Lost task 34.48 in stage 0.0 (TID
816, iriclusnd20): java.io.IOException: Added a key not lexically larger
than previous 
key=\x00\x17\x083661310846GMP\x00\x00\x00\x01E\xF3jH@\x010GEN\x00\x00\x01M\xDF\xA6!\xFF\x04,
lastkey=\x00\x17\x1E7359530994GMP\x00\x00\x00\x01@
\xD4\xFE\xC0\xC0\x010_0\x00\x00\x01M\xDF\xA6!\xFF\x04
        at
org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:202)
        at
org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:288)
        at
org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:253)
        at
org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:935)
        at
org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2$1.write(HFileOutputFormat2.java:196)
        at
org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2$1.write(HFileOutputFormat2.java:149)
        at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:1000)
        at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:979)
        at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:64)
        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

I get the above error multiple times.
The HDFS path is fine, there is no error about that.

Thanks!

On 11 June 2015 at 17:49, Dawid <[email protected]> wrote:

>  Hi,
> Your code seems ok to me, the only difference with what I do is that I
> explicitly pass hdfs path to bulkSave, I am not sure how "/bulk is resolved.
> I am very beginner with spark, hbase, phoenix etc. but if you'd like to
> use this code I could try to investigate your problem, but I need the full
> stack trace.
>
>
>
> On 11.06.2015 00:53, Yiannis Gkoufas wrote:
>
>  Hi Dawid,
>
>  yes I have been using your code. Probably I am invoking the classes in a
> wrong way.
>
> val data =  readings.map(e => e.split(","))
> .map(e => (e(0),e(1).toLong,e(2).toDouble,e(3).toDouble))
> val tableName = "TABLE";val columns = Seq("SMID","DT","US","GEN");val zkUrl = 
> Some("localhost:2181");
> val functions = new ExtendedProductRDDFunctions(data);val hfiles = 
> functions.toHFile(tableName,columns,new Configuration,zkUrl);val loader = new 
> BulkPhoenixLoader(hfiles);
> loader.bulkSave(tableName,"/bulk",None);
>
>
> Does the above seem the correct way to you?
>
>
> Thanks a lot!
>
>
> On 10 June 2015 at 19:13, Dawid <[email protected]> wrote:
>
>> Thx a lot James. That's the case.
>>
>>
>>
>> On 10.06.2015 19:50, James Taylor wrote:
>>
>>> David,
>>> It might be timestamp related. Check the timestamp of the rows/cells
>>> you imported from the HBase shell. Are the timestamps later than the
>>> server timestamp? In that case, you wouldn't see that data. If this is
>>> the case, you can try specifying the CURRENT_SCN property at
>>> connection time with a timestamp later than the timestamp of the
>>> rows/cells to verify.
>>> Thanks,
>>> James
>>>
>>> On Wed, Jun 10, 2015 at 10:14 AM, Dawid <[email protected]>
>>> wrote:
>>>
>>>> Yes, that's right I have generated HFile's that I managed to load so to
>>>> be
>>>> visible in HBase. I can't make them 'visible' to phoenix.
>>>>
>>>> What I noticed today is I have rows loaded from the generated HFiles and
>>>> upserted through sqlline when I run 'DELETE FROM TABLE' only the
>>>> upserted
>>>> one disappears. The loaded from HFiles still persist in HBase.
>>>>
>>>> Yiannis how do you generate the HFiles? You can see my code here:
>>>> https://gist.github.com/dawidwys/3aba8ba618140756da7c
>>>>
>>>>
>>>> On 10.06.2015 17:57, Yiannis Gkoufas wrote:
>>>>
>>>> Hi Dawid,
>>>>
>>>> I am trying to do the same thing but I hit a wall while writing the
>>>> Hfiles
>>>> getting the following error:
>>>>
>>>> java.io.IOException: Added a key not lexically larger than previous
>>>>
>>>> key=\x00\x168675230967GMP\x00\x00\x00\x01=\xF4h)\xE0\x010GEN\x00\x00\x01M\xDE.\xB4T\x04,
>>>>
>>>> lastkey=\x00\x168675230967GMP\x00\x00\x00\x01=\xF5\x0C\xF5`\x010_0\x00\x00\x01M\xDE.\xB4T\x04
>>>>
>>>> You have reached the point where you are generating the HFiles, loading
>>>> them
>>>> but you dont see any rows in the table?
>>>> Is that correct?
>>>>
>>>> Thanks
>>>>
>>>>
>>>> On 8 June 2015 at 18:09, Dawid <[email protected]> wrote:
>>>>
>>>>>
>>>>> Yes, I did. I also tried to execute some upserts using sqlline after
>>>>> importing HFiles, and rows from upserts are visible both in sqlline and
>>>>> hbase shell, but
>>>>> the rows imported from HFile are only in hbase shell.
>>>>>
>>>>>
>>>>> On 08.06.2015 19:06, James Taylor wrote:
>>>>>
>>>>>> Dawid,
>>>>>> Perhaps a dumb question, but did you execute a CREATE TABLE statement
>>>>>> in sqlline for the tables you're importing into? Phoenix needs to be
>>>>>> told the schema of the table (i.e. it's not enough to just create the
>>>>>> table in HBase).
>>>>>> Thanks,
>>>>>> James
>>>>>>
>>>>>> On Mon, Jun 8, 2015 at 10:02 AM, Dawid <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Any suggestions? Some clues what to check?
>>>>>>>
>>>>>>>
>>>>>>> On 05.06.2015 23:21, Dawid wrote:
>>>>>>>
>>>>>>> Yes I can see it in hbase-shell.
>>>>>>>
>>>>>>> Sorry for the bad links, i haven't used private repositories on
>>>>>>> github.
>>>>>>> So I
>>>>>>> moved the files to a gist:
>>>>>>> https://gist.github.com/dawidwys/3aba8ba618140756da7c
>>>>>>> Hope this times it will work.
>>>>>>>
>>>>>>> On 05.06.2015 23:09, Ravi Kiran wrote:
>>>>>>>
>>>>>>> Hi Dawid,
>>>>>>>      Do you see the data when you run a simple scan or count of the
>>>>>>> table
>>>>>>> in
>>>>>>> Hbase shell ?
>>>>>>>
>>>>>>> FYI. The links lead me to a 404 : File not found.
>>>>>>>
>>>>>>> Regards
>>>>>>> Ravi
>>>>>>>
>>>>>>> On Fri, Jun 5, 2015 at 1:17 PM, Dawid <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>> I was trying to code some utilities to bulk load data through HFiles
>>>>>>>> from
>>>>>>>> Spark RDDs.
>>>>>>>> I was trying to took the pattern of CSVBulkLoadTool. I managed to
>>>>>>>> generate
>>>>>>>> some HFiles and load them into HBase, but i can't see the rows using
>>>>>>>> sqlline. I would be more than grateful for any suggestions.
>>>>>>>>
>>>>>>>> The classes can be accessed at:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> https://github.com/dawidwys/gate/blob/master/src/main/scala/pl/edu/pw/elka/phoenix/BulkPhoenixLoader.scala
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> https://github.com/dawidwys/gate/blob/master/src/main/scala/pl/edu/pw/elka/phoenix/ExtendedProductRDDFunctions.scala
>>>>>>>>
>>>>>>>> Thanks in advance
>>>>>>>>
>>>>>>>> Dawid Wysakowicz
>>>>>>>>
>>>>>>>>
>>>>>>>>  --
>>>>>>> Pozdrawiam
>>>>>>> Dawid
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Pozdrawiam
>>>>>>> Dawid
>>>>>>>
>>>>>>
>>>>> --
>>>>> Pozdrawiam
>>>>> Dawid
>>>>>
>>>>>
>>>> --
>>>> Pozdrawiam
>>>> Dawid
>>>>
>>>
>>   --
>> Pozdrawiam
>> Dawid
>>
>>
>
> --
> Pozdrawiam
> Dawid
>
>

Re: Bulk loading through HFiles

Reply via email to